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ESTIMATION  AND  CONFIDENCE  REGIONS  FOR  PARAMETER  SETS 

IN  ECONOMETRIC  MODELS* 

VICTOR  CHERNOZHUKOVf   HAN  HONG§    ELIE  TAMER* 

Abstract.  The  paper  develops  estimation  and  inference  methods  for  econometric  models 
with  partial  identification,  focusing. on  models  defined  by  moment  inequalities  and  equalities. 
Main  applications  of  this  framework  include  analysis  of  game-theoretic  models,  revealed  pref- 
erence, regression  with  missing  and  mismeasured  data,  auction  models,  bounds  in  structural 
quantile  models,  bounds  in  asset  pricing,  among  many  others. 

Specifically,  this  paper  provides  estimators  and  confidence  regions  for  minima  of  an  econo- 
metric criterion  function  Q{9).  In  applications,  Q{d)  embodies  testable  restrictions  on  eco- 
nomic models.  A  parameter  9  that  describes  an  economic  model  passes  these  restrictions  if 
Q{9)  attains  the  minimum  value  normalized  to  be  zero.  The  interest  therefore  focuses  on 
the  set  of  parameters  0/  that  minimizes  Q{0),  called  the  identified  set.  This  paper  uses 
the  inversion  of  the  sample  analog  Q„{6)  of  the  population  criterion  Q{6)  to  construct  the 
estimators  and  confidence  regions  for  0/.  We  develop  consistency,  rates  of  convergence,  and 
inference  results  for  these  estimators  and  regions.  The  results  are  shown  to  hold  under  general 
yet  simple  conditions,  and  practical  procedures  are  provided  to  implement  the  approach.  In 
order  to  derive  these  results,  the  paper  also  develops  methods  for  analyzing  the  asymptotics 
of  sample  criterion  functions  under  set  identification. 
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1.  Introduction 

Parameters  of  interest  in  econometric  models  can  be  defined  as  values  that  minimize  a 
population  criterion  function.  If  this  criterion  function  is  minimized  uniquely  at  a  particular 
parameter  vector,  then  one  can  obtain  confidence  regions  for  this  parameter  using  a  sample 
analog  of  this  function.  This  paper  extends  this  criterion-beised  estimation  and  inference  to 
econometric  models  where  the  objective  function  is  minimized  on  a  set  of  parameters,  the 
identified  set.  (The  terminology  follows  Manski  (2003).)  Our  goal  is  to  estimate  and  make 
inferences  directly  on  the  identified  set.  The  development  focuses  on  moment  condition  models 
defined  by  either  moment  inequalities  or  moment  equalities. 

This  paper  uses  the  inversion  of  the  sample  criterion  functions  as  the  building  principle 
for  estimators  and  confidence  regions.  The  resulting  estimators  and  confidence  regions  are 
appropriate  contour  sets  of  the  sample  criterion  functions.  The  paper  develops  consistency, 
rates  of  convergence,  and  inference  results  for  these  sets.  Specifically,  this  paper  shows  that  an 
appropriate  lower  contour  set  of  the  sample  criterion  function  converges  in  Hausdorff  metric 
to  the  identified  set  at  (an  exact  or  an  arbitrarily  close  to)  l/\/n  rate,  in  moment  condition 
problems,  and  at  polynomial  rates,  more  generally.  The  paper  develops  a  method  for  de- 
termining the  appropriate  level  of  the  contour  set  so  that  it  covers  the  identified  set  with  a 
prespecified  probability.  For  this  purpose,  the  paper  derives  the  asymptotics  of  several  infer- 
ential statistics  which  quantiles  determine  the  appropriate  level  of  the  contour  set.  The  lack 
of  equi-continuous  behavior  of  the  sample  criterion  functions  in  moment  inequality  problems 
poses  challenges  to  this  analysis. 

The  primary  applications  of  the  estimation  and  inference  methods  developed  in  this  paper 
are  in  such  areas  as  (1)  empirical  game-theoretic  models,  (2)  empirical  revealed  preference 
analysis,  (2)  econometric  analysis  with  missing  and  mis-measured  data,  (3)  bounds  analysis 
in  auction  models,  (4)  structural  quantile  models  and  other  simultaneous  equation  models 
without  additivity,  (5)  bounds  analysis  in  asset  pricing  models,  and  (6)  the  inference  on 
dominance  regions  in  stochastic  dominance  analysis,  among  others.  In  most  of  these  problems, 
the  economic  models  of  interest  satisfy  a  collection  of  moment  inequalities,  and  the  resulting 
criterion  functions  are  typically  minimized  on  a  set.  Our  paper  develops  estimators  and 
confidence  regions  for  these  sets. 


In  the  context  of  estimation  of  games  and  revealed  preference  analysis,  our  methods  have 
already  been  employed  by  several  substantive  empirical  studies.  Bajari,  Benkard,  and  Levin 
(2006)  estimated  a  dynamic  Markov  game  where  the  observed  action  of  each  player  satis- 
fies discrete  optimality  conditions  for  equilibrium,  which  result  in  moment  inequahty  condi- 
tions. Beresteanu  and  Ellickson  (2006)  presented  a  further  application  to  a  study  of  dynamic 
oligopolistic  competition.  Ciliberto  and  Tamer  (2003)  analyze  empirical  entry  models  with 
multiple  equilibria.  They  do  not  make  the  equilibrium  selection  assumptions,  which  leads 
to  moment  inequality  conditions  and  set  identification.  Cohen  and  Manuszak  (2006)  esti- 
mate a  game  in  which  firms  may  enter  as  different  types  and  which  has  multiple  equilibria  in 
types.  They  also  do  not  impose  equilibrium  selection  assumptions.  Borzekowski  and  Cohen 
(2004)  estimate  a  model  of  strategic  complementarity  between  credit  unions  in  their  choice  of 
adopting  a  technology  or  outsourcing  it.  In  the  context  of  simultaneous  equation  models  with 
non-additive  disturbances,  Chernozhukov  and  Hansen  (2004)  estimate  a  demand  model  where 
a  partial  identification  occurs.  In  fact,  they  use  the  pointwise  versions  of  the  inference  methods 
developed  here.  Econometric  analysis  with  missing  and  mismeasured  data  is  another  area  of 
applications  of  our  methods:  Molinari  (2004)  apphed  our  methods  to  construct  a  confidence 
region  for  the  identified  set  in  a  causal  model  with  missing  treatments.  In  auction  analysis, 
the  very  nature  of  auction  mechanisms  also  often  leads  to  the  missing  data  framework,  see 
Haile  and  Tamer  (2003). 

In  addition  to  these  existing  applications,  there  appear  to  be  many  more  potential  appli- 
cations to  the  revealed  preference  analysis,  e.g.  see  Varian  (1982,  1984),  McFadden  (2005), 
Blundell,  Browning,  and  Crawford  (2005),  and  Bajari,  Fox,  and  Ryan  (2006).  A  potentially, 
important  application  is  the  inference  on  the  set  of  asset  pricing  models  that  satisfy  mean 
and  volatility  bounds  developed  in  Hansen,  Heaton,  and  Luttmer  (1995).  Yet  another  area 
of  potential  applications  is  in  the  analysis  of  stochastic  dominance  relations,  e.g.  see  Linton, 
Post,  and  Wang  (2005). 

The  relationship  of  this  paper  to  the  econometric  literature  on  inference  under  partial 
identification  is  as  follows.^  The  concepts  of  set  identification  go  back  to  Frisch  (1934)  and 
Marschak  and  Andrews  (1944).  Marschak  and  Andrews  (1944)  constructed  the  identified  set 


^Manski  (2003)  for  a  detailed  introduction  to  partial  identifiability. 
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as  a  collection  of  parameters  representing  different  production  functions  that  can  not  be  re- 
jected by  the  data  and  that  are  consistent  with  functional  restrictions  the  authors  consider. 
Frisch  (1934)  constructs  consistent  interval  bounds  on  parameters  of  structural  regression 
equations  that  are  subject  to  measurement  error.  Klepper  and  Learner  (1984)  generalized 
the  Frisch  bounds  to  multivariate  regression  models  with  measurement  errors  and  constructed 
consistent  estimates.  Gilstein  and  Leamer  (1983)  provided  set  consistent  estimation  in  a  class 
of  nonlinear  regression  models  where  the  identified  set  is  an  interval  of  parameters  that  are 
robust  to  misspecification  of  the  distribution  of  the  error  term.  In  a  different  development, 
Phillips  (1989)  suggested  that  multi-collinearity  may  be  a  cause  for  partial  identification  in 
a  number  of  econometric  models  and  provided  some  asymptotic  results  for  Wald  statistics 
under  such  conditions.  Hansen,  Heaton,  and  Luttmer  (1995)  proposed  an  estimator  for  the 
region  of  feasible  means  and  variances  of  pricing  kernels  in  asset  pricing  model  and  proved  its 
consistency.  Manski  and  Tamer  (2002)  developed  a  number  of  models  with  interval-censored 
data,  as  well  as  derived  several  consistency  results.  In  a  previous  version  of  this  paper,  Cher- 
nozhukov,  Hong,  and  Tamer  (2002)  developed  consistency  and  inference  results  for  linear 
moment  inequality  models,  using  inversion  of  the  econometric  criterion  functions,  and  devel- 
oped an  empirical  apphcation.  Imbens  and  Manski  (2004)  investigated  a  problem  of  Wald 
inference  in  the  case  of  a  scalar  mean  parameter  bounded  above  and  below  by  other  scalar 
means.  Recently,  Andrews,  Berry,  and  Jia  (2004)  and  Pakes,  Porter,  Ishii,  and  Ho  (2006) 
investigated  the  inference  problem  using  projection  methods,  which  proceeds  bj^  constructing 
a  region  for  point-identified  high-dimensional  nuisance  parameter  and  then  further  projecting 
it  with  the  purpose  of  obtaining  a  confidence  region  for  the  partially  identified  functionals  of 
this  parameter,  such  as  the  identified  set.  This  projection  method  tends  to  be  conservative 
relative  to  the  methods  developed  in  this  paper. ^  Beresteanu  and  Molinari  (2006)  develop 
inference  methods  for  the  linear  regression  model  with  interval-censored  outcomes,  using  the 
Wald  statistic  that  measures  the  Hausdorff  distance  between  the  identified  set  and  a  set- valued 
estimator.  These  methods  differ  from  the  criterion-based  inference  studied  in  this  paper.  Our 
paper  is  also  related  to  the  literature  on  the  weak  identification  problem,  see  notably  Dufour 


Conservativity  of  projection  methods  in  the  point-identified  setting  is  discussed  in  Romano  and  Wolf 
(2000). 


(1997)  and  Staiger  and  Stock  (1997).  However,  the  problem  studied  here  considerably  dif- 
fers fi-om  the  latter,  as  the  nature  of  failure  of  point  identification  in  our  main  applications 
typically  can  not  be  approximated  by  the  weak  identification  framework. 

The  relationship  of  this  paper  to  the  statistical  literature  is  as  follows.  Hannan  (1982) 
has  pointed  out  the  multi-collinearity  (i.e.  set-identification)  problems  in  several  time  series 
models.  Redner  (1981)  and  Hannan  and  Deistler  (1988)  showed  that  a  maximum  likelihood 
estimator  eventually  converges  to  a  point  in  the  identified  set  0/,  though  obviously  it  is 
not  consistent  for  estimation  of  9/.  Veres  (1987),  Dacunha-Castelle  and  Gassiat  (1999), 
and  Liu  and  Shao  (2003)  investigated  the  behavior  of  the  likelihood  ratio  test  under  loss  of 
identifiability  in  correctly  specified  likelihood  models,  with  a  special  focus  on  the  mixture 
and  ARMA  models.  These  results  do  not  apply  or  extend  in  any  obvious  way  to  moment 
condition  models  analyzed  in  this  paper.  Fukumizu  (2003)  pointed  out  that  the  likelihood 
ratio  has  an  unusually  large  stochastic  order  in  multi-layer  neural  networks,  which  does  not 
apply  to  the  moment  condition  problems  analyzed  in  this  paper.  Also,  the  literature  on  image 
processing  considers  the  problem  of  support  estimation  of  a  density,  e.g.  Korostelev,  Simar, 
and  Tsybakov  (1995)  and  Cuevas  and  Fraiman  (1997),  though  the  structure  of  such  problem 
is  different  from  that  arising  in  the  moment  condition  models  analyzed  here. 

The  rest  of  the  paper  is  organized  as  follows.  Section  2  presents  the  moment  condition 
models  and  several  examples  that  will  serve  to  illustrate  the  analysis.  Section  2  also  outlines 
informally  the  main  results  of  the  paper.  Section  3  develops  consistency,  rates  of  convergence, 
and  inference  results  that  apply  generically.  The  results  require  that  simple  high-level  condi- 
tions on  the  econometric  criterion  functions  are  met.  Section  4  analyzes  the  moment  inequality 
and  moment  equality  models  in  detail  and  verifies  the  conditions  of  Section  3.  Appendix  col- 
lects proofs  and  a  definition  of  notations  used  in  the  paper,  and  also  discusses  pointwise 
confidence  regions  (confidence  regions  for  particular  parameter  values  in  the  identified  set). 

2.  Problem  Definition  and  Informal  Discussion  of  the  Main  Results 

Consider  a  nonnegative  population  criterion  function  Q{6)  which  attains  its  minimal  value 
0  on  a  set  0/,  that  is  0/  =  {^  G  0  :  Q{9)  =  0}.  The  set  0/  generally  consists  of  many 
parameter  values,  and  thus  is  a  singleton.  Suppose  there  is  also  a  sample  analog  Qn{6)  of  this 
function.  The  parameter  9  belongs  to  the  parameter  space  0,  which  is  a  compact  subset  of  the 
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Euclidean  space  IR'^.  Every  0  in  0/  indexes  an  economic  model  that  passes  testable  empirical 
restrictions  that  the  criterion  Q{9)  typically  embodies  in  economic  applications.  This  paper 
investigates  the  estimators  and  confidence  regions  for  0/  constructed  using  the  contour  sets  of 
the  sample  criterion  function  Qn-  (Appendix  also  discusses  a  related  problem  of  constructing 
confidence  regions  for  a  particular  point  d*  in  0/.) 

This  section  begins  with  a  review  of  the  main  econometric  models  and  economic  examples 
that  motivate  the  framework  described  above.  Then,  the  section  gives  an  informal  review  of 
the  methods  and  the  results  obtained  in  this  paper. 

2.1.  Moment  Condition  Models.  This  paper  is  primarily  concerned  with  applications  to 
two  main  types  of  econometric  structural  models:  moment  inequalities  and  moment  equalities. 
In  empirical  analysis,  the  moment  inequalities,  much  like  moment  equalities,  represent  testable 
restrictions  on  economic  models.  Economic  models  are  described  by  the  finite-dimensional 
parameters  ^  e  0  C  IR'^,  where  0  is  the  parameter  space.  We  are  interested  in  the  set  of 
parameters  0/  C  0  that  satisfy  the  testable  restrictions. 

The  moment  restrictions  are  computed  with  respect  to  the  population  probability  law  P 
of  the  data  and  take  the  form 

Ep[mi{e)]  <  0,  (2.1) 

where  mi{9)  =  m{9,  lUi)  is  a  vector  of  moment  functions  parameterized  by  9  and  determined  by 
a  vector  of  real  random  variables  Wi.  Therefore  the  set  of  parameters  6  that  pass  restrictions 
(2.1)  is  given  by  0/  =  {^  G  0  :  Ep[mi{e)]  <  0}. 

It  is  interesting  to  comment  on  the  structure  of  the  set  0/  in  this  model.  When  the  moment 
functions  are  linear  in  parameters,  the  set  0/  is  given  by  an  intersection  of  linear  half-spaces; 
and  could  be  a  triangle,  trapezoid,  or  a  polyhedron,  as  in  Examples  1  and  2  introduced  below. 
When  moment  functions  are  non-linear,  the  set  0/  is  given  by  an  intersection  of  nonlinear 
half-spaces  which  boundaries  are  defined  by  nonlinear  manifolds. 

The  set  0/  can  be  characterized  as  the  set  of  minimizers  of  the  criterion  function^ 

Qi9)  :=  \\Ep[mm'W'/\9)\\l,  (2.2) 


Let  ||a;||+  =  ||(a;)+||  and  ||x||_  =  ||(x)_||,  where  {x)+  :=  max(a;,0)  and  (a;)_  :=  max(— x,0). 
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where  W{6)  is  a  continuous  and  diagonal  matrix  with  strictly  positive  diagonal  elements  for 
each  9  e  Q.  Therefore,  the  inference  on  0/  may  be  based  on  the  empirical  analog  of  Q: 

Qr,{9)  :=  \\E„[mm'K^\ml,    £^nK(0)]  :=  -X^m,(^),  (2.3) 

t=\ 

where  Wn{9)  is  a  uniformly  consistent  estimate  of  W{9).  In  applications  Wn{9)  can  be  taken 
to  be  an  identity  matrix  or  chosen  to  weigh  the  individual  empirical  moments  by  estimates  of 
inverses  of  their  individual  variances. 

Moment  equalities  are  more  traditional  in  empirical  analysis.  The  economic  models,  in- 
dexed by  9,  are  assumed  to  satisfy  the  set  of  testable  restrictions  given  by  moment  equalities: 

Ep[mi{e)]  =  0,  that  is  Qi  =  {9  e  Q  :  Ep[mi{9)]  =  0}.  (2.4) 

When  the  moment  functions  are  linear  in  parameters,  the  set  0/  is  either  a  point  or  a 
hyperplane  intersected  with  the  parameter  space  0.  'V\nien  moment  functions  are  non-linear, 
the  set  0/  is  typically  a  manifold,  which  also  includes  the  case  of  isolated  points  (a  zero- 
dimensional  manifold). 

The  set  0/  can  be  characterized  as  the  set  of  minimizers  of  the  generalized  method  of 
moments  function 

Q{9)  :=  \\Ep[mm]'W'/m\\\  .  (2.5) 

where  W{9)  is  a  continuous  and  positive-definite  matrix  for  each  9  E  Q.  The  inference  on  0/ 
is  based  on  the  conventional  generalized  method-of-moments  function 

Qr.{9):=\\E^[m,{9)]'K^'m\  (2.6) 

where  Wn{9)  is  a  uniformly  consistent  estimate  of  W{9).  In  applications  Wn{9)  can  be  an 
identity  matrix  or  an  estimate  of  the  inverse  of  the  asymptotic  covariance  matrix  of  empirical 
moment  functions. 

In  many  situations,  we  can  also  use  the  modified  objective  function  for  inference: 

Q„(0)-  mf^g„(e')- 
This  modification  is  useful  in  cases  where  Qn  does  not  attain  value  0  in  finite  samples.^ 


"^In  such  cases,,  using  the  modified  objective  function  typicallj'  leads  to  power  improvements,  as  is  well-known 

in  point-identified  cases. 
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2.2.  Motivating  Examples.  There  are  several  interesting  examples  for  the  moment  condi- 
tion models  described  above,  where  the  identified  set  9/  is  naturally  a  collection  of  points, 
rather  than  a  single  point. 

Example  1  (Interval  Data).  The  first  example  is  motivated  by  missing  data  problems,  where 
Y  is  an  miobserved  real  random  variable  bracketed  below  by  Yi  and  above  by  I2,  both  of 
which  are  observed  real  random  variables.  The  parameter  of  interest  9  =  £'p[V]  is  known  to 
satisfy  the  restriction 

Ep[Y,\  <  9  <  Ep[Y2]. 

Hence  the  identified  set  is  an  interval,  Qj  =  {9  :  Ep[Yi]  <  9  <  Ep[Y2]}.  This  example  falls  in 
the  moment-inequality  framework  with  moment  function 

mi{e)  =  {Yu-9,9~Y2^)'. 

Therefore,  6/  can  be  characterized  as  the  set  of  minimizers  of  Q{9)  =  \\Ep[mi{9)]\W  = 
{Ep[Yu]-9)l  +  iEp[Y2i]-9)l,  with  the  sample  analog  g„(0)  =  {Er,[Yu]  -  9)%  +  {E^^lY^^]  -  9)1 . 

Example  2  (Interval  Outcomes  in  Regression  Models).  A  regression  generalization  of  the 
previous  example  is  immediate.  Suppose  a  regressor  vector  Xi  is  available,  and  the  conditional 
mean  of  unobserved  Yi  is  modeled  using  linear  function  X19.  The  parameters  of  this  function 
can  be  bounded  using  inequality  £'p[Yii |Xj]  <  X19  <  Ep[Y2i\Xi].  These  conditional  restrictions 
imply  the  following  inequalities  are  valid: 

Ep[YuZ,]  <  9'Ep[XiZi]  <  Ep[Y2.Z,], 

where  Zi  is  a  vector  of  positive  transformations  of  Xi,  for  instance,  Zj  =  {l{Xi  <  Xj),j  = 
1, ...,  jy,  for  a  suitable  collection  of  values  Xj.  These  inequalities  define  the  identified  set  9/, 
which  is  therefore  given  by  an  intersection  of  hnear  half-spaces  in  H'^.  This  example  also  falls 
in  the  moment  inequality  framework,  with  the  moment  function  given  by 

m,{9)  =  {{Yu  -  9'X,)Z[,  -(F.,  -  9'X,)Z[)'. 

In  auction  analysis,  the  bracketing  of  the  latent  response  Y  -  bidder's  valuation  -  by 

functions  of  observed  bids,  Yi  and  Y2,  is  very  natural  and  occurs  in  a  variety  of  settings,  see 

Haile  and  Tamer  (2003).   Analogous  situations  occur  in  income  surveys,  where  only  income 

brackets  are  available  instead  of  true  income,  see  Manski  and  Tamer  (2002).  Chernozhukov, 

Hong,  and  Tamer  (2002)  analyze  this  linear  moment  inequality  set  up  in  detail. 
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Example  3  (Optimal  Choice  of  Economic  Agents  and  Game  Interactions).  Analysis  of  the 
optimal  choice  behavior  of  firms  and  economic  agents  is  another  area  of  applications  of  (2.1). 
Suppose  that  a  firm  can  make  two  choices  Di  =  0  or  Di  —  1.  Suppose  that  the  profit  of 
the  firm  from  making  the  choice  Z?j  is  given  by  n  {Wi,  Di,9)  +  Ut,  where  Ui  is  a  disturbance 
such  that  Ep[Ui\Xi]  =  0,  for  Xj  representing  information  available  to  make  the  decision,  and 
Wi  are  various  determinants  of  the  firm's  profit,  some  of  which  may  be  included  in  Xi.  For 
example,  W^  may  include  actions  of  other  firms  that  affect  the  firm's  profit.  From  a  revealed 
preference  principle,  the  fact  that  the  firm  chooses  Di  necessarily  implies  that 

EpItt  {Wi,  A,  0)  \Xi]  >  Ep[n  {Wi,  1  -  Di,  9)  |X,].  (2.7) 

Therefore,  we  can  take  the  moment  condition  in  (2.1)  to  be 


m,{d)  =  {'n{W,,l-Di,B)-T:{Wi,D,,d))Zi,  (2.8) 

where  Zi  is  the  set  of  positive  instrumental  variables  defined  as  positive  transformations  of 
Xi,  as  in  the  previous  example. 

This  simple  example  highlights  the  structure  of  empirically  testable  restrictions  arising 
from  the  optimizing  behavior  of  firms  and  economic  agents.  These  testable  restrictions  are 
given  in  the  form  of  moment  inequality  conditions.  It  could  be  noted  that  this  simple  example 
also  allows  for  game-theoretic  interactions  among  economic  agents.  The  moment  inequahty 
conditions  of  the  above  kind  are  ubiquitous,  and  are  known  to  arise  in  (more  realistic)  dynamic 
settings,  see  Bajari,  Benkard,  and  Levin  (2006),  Ciliberto  and  Tamer  (2003),  and  Ryan  (2005). 
Similar  principles  are  used  in  Blundell,  Browning,  and  Crawford  (2005)  to  analyze  bounds  on 
demand  functions.  Related  ideas  also  appear  in  the  area  of  stochastic  revealed  preference 
analysis,  e.g.  see  Varian  (1984)  and  McFadden  (2005). 

Example  4  (Structural  Equations).  Consider  the  structural  instrumental  variable  estima- 
tion of  returns  to  schooling.   Suppose  that  we  are  interested  in  the  following  example  where 

potential  income  Y  is  related  to  education  E  through  a  flexible  quadratic  functional  form, 
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Y  =  9q  +  e^E  +  e2E'^  +  t  =  X'e  +  e,  for  9  -  (^0,^1,^2)  and  X  =  {l,E,E^y.  Although  par- 
simonious, this  simple  model  is  not  point-identified  in  the  presence  of  the  standard  quarter- 
of-birth  instrument  suggested  in  Angrist  and  Krueger  (1992).^  In  the  absence  of  point  iden- 
tification, all  parameter  values  6  consistent  with  the  instrumental  orthogonality  restriction 
Ep[{Y  —  d'X)Z]  =  0  are  of  interest  for  purposes  of  economic  analysis.  Phillips  (1989)  devel- 
ops a  number  of  related  examples.  Similar  partial  identification  problems  arise  in  nonlinear 
moment  and  instrumental  variables  problems,  see  e.g.  Demidenko  (2000)  and  Chernozhukov 
and  Hansen  (2005).  In  Chernozhukov  and  Hansen  (2005),  the  parameters  9  of  the  structural 
quantile  functions  for  returns  to  schooling  satisfy  the  restrictions: 

Ep[{t-1{Y  <X'9))Z]  =  0, 

where  r  G  (0, 1)  is  the  quantile  of  interest.  This  is  an  example  of  a  nonlinear  instrumental 
variable  model,  where  the  identification  region,  in  the  absence  of  point  identification,  is  gen- 
erally given  by  a  nonlinear  manifold.  Chernozhukov  and  Hansen  (2004)  and  Chernozhukov, 
Hansen,  and  Jansson  (2005)  analyze  an  empirical  returns-to-schooling  example  and  a  struc- 
tural demand  example  where  such  situations  arise. 

2.3.  Informal  Discussion  of  Results.  The  objective  of  this  paper  is  to  construct  sets  C„ 
for  6/  that  are  consistent  estimates  of  0/,  converge  to  0/  at  fastest  rates,  and  have  the 
confidence  interval  property  lim  inf„_oo -P(0/  Q  Cn)  =  a,  for  a  prespecified  confidence  level 
a  G  (0, 1).^  The  sets  C„  we  construct  take  the  form  of  a  contour  set  Cn{c)  of  level  c  of  the 
sample  criterion  function  Q„: 

Cn{c):={0ee:anQniO)<c}, 

for  some  appropriate  normalization  a„,  where  a„  =  n  in  Examples  1-4.  In  order  to  simplify 
the  discussion,  assume  a„  =  n  in  this  section  only. 


^The  instrument  is  the  indicator  of  the  first  quarter  of  birth.  Sometimes  the  indicators  of  other  quarters  of 
birth  are  used  as  instruments.  However,  these  instruments  are  not  correlated  with  education  (correlation  is 
extremely  small)  and  thus  bring  no  additional  identification  information. 

Robustness  to  perturbing  P  is  also  discussed  in  the  Addendum  to  this  paper,  which  obtains  the  conditions 
under  which  coverage  holds  under  contiguous  perturbations  of  P.  In  addition,  Andrews  and  Guggenberger 
(2006)  establish  global  robustness  of  the  subsampling  confidence  regions  proposed  in  this  paper  in  a  class 
of  moment  inequality  problems.  Sheikh  (2006)  establishes  global  robustness  of  our  subsampling  regions  in 
Example  1. 
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In  order  to  estimate  0/  consistently,  the  level  c  =  c  (which  can  be  data-dependent)  needs 
to  be  diverging  to  infinity  slowly;  for  concreteness,  we  can  set  c  =  Inn.  However,  in  a  class 
of  problems,  c'does  not  need  to  be  diverging,  so  we  can  set  0  <  c  =  Op(l)  and  even  c"  =  0. 
E.g.,  in  Example  1,  c  =  0  gives  us  C„(0)  =  [Enp'i],  £'„[y2]],  which  clearly  is  consistent  for  the 
region  [£Jp[Yi],Ep[y2]].  Generally,  whether  c  can  be  non-diverging  in  order  to  maintain  con- 
sistency depends  on  whether  the  sample  criterion  function  a„Q„(^)  has  degenerate  behavior, 
i.e.  vanishes,  over  contractions  of  9/  in  large  samples,  as  formally  stated  in  Section  3.  In 
particular,  the  latter  property  does  not  hold  in  Example  4,  but  typically  holds  in  Examples 
1-3,  under  conditions  formally  stated  in  Section  4. 

The  analysis  of  the  rates  of  convergence  and  consistency  makes  use  of  the  Hausdorff  dis- 
tance between  sets,  which  is  defined  as 


dH{A,  B)  :=  max 


sup  d[a,  5),  sup  d{b,  A) 

a^A  beB 


where  d{b,A)  :=  inf  \\b  —  a\\, 

a£,4 


and  dniA,  B)  ;=  oo  if  either  A  or  B  is  empty.  The  motivation  for  the  use  of  this  metric  comes 
from  it  being  a  natural  generalization  of  the  Euclidean  distance  and  its  previous  uses  by  other 
authors  in  the  consistency  analysis  in  the  context  of  set  estimation,  see  Hansen,  Heaton,  and 
Luttmer  (1995).  The  general  consistency  result 

c?i7(C„(c),e;)-.pO, 

obtained  in  this  paper  follows  from  the  uniform  convergence  of  the  sample  function  Q„  to  the 
limit  continuous  function  Q  over  the  compact  parameter  space  0,  where  the  rate  of  conver- 
gence over  set  0/  is  l/a„.  Such  uniform  convergence  condition  is  conventional  in  econometric 
literature,  and  is  thus  easilj'  verifiable. 

The  rates  of  convergence  follows  from  the  existence  of  polynomial  minorants  on  Qn{0) 
over  suitable  neighborhoods  of  0/,  as  defined  formally  in  Section  3.  Existence  of  quadratic 
minorants  on  Qn  occurring  in  Examples  1-4,  as  verified  in  Section  4,  implies  that 

c?h(C„(c),07)  =  Op(v'max(c,  l)/n), 

which  is  very  close  to  l/\/n  rate  of  convergence,  and  is  exactly  \j \fn  in  many  moment 

inequality  problems  (such  as  Examples  1  and  2,  where  anQn  has  degenerate  asymptotics 

over  contractions  of  0/). 
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In  order  for  C„(c)  to  have  the  confidence  region  property  for  0/,  we  need  to  choose  level 
c  =  c{q.),  such  that  c[a)  is  a  consistent  estimate  of  the  a-quantile  of  the  statistic: 

Cn  ■■=   sup  anQn[d),  ■  '        (^2.9) 

which  is  a  quasi-likelihood-ratio  type  quantity.  The  estimates  c[a)  can  be  based  on  the 
limit  distributions  of  (2.9)  or  a  generic  subsampling  method,  which  are  developed,  respec- 
tively in  Section  4  and  Section  3.5.  For  instance,  in  Example  1,  suppose  {s/n{En\Yi\  — 
Ep\Yi]),  y/n{En[Y^  -  Ep[Y2]))  -^d  (H^i,  W2)  =  N{0,  ft),  then  Section  3  shows  that  Cn^dC  = 
max[{Wi)\,  (^^2)?.],  where  the  distribution  of  C  can  be  easily  obtained  by  simulation  methods 
(see  Section  4).  For  the  cases  when  the  limit  distribution  of  (2.9)  is  not  easily  tractable, 
the  paper  constructs  a  generic  subsampling  estimate  c(a),  which  is  based  on  subsampling  an 
approximation  of  statistic  (2.9),  where  one  uses  the  consistent  estimate  Cn(c)  in  place  of  the 
unknown  set  0/  in  (2.9)  (see  Section  3.5  for  a  detailed  description  of  the  algorithm). 

The  paper  characterizes  the  asymptotic  behavior  and  derives  the  limit  distribution  of  the 
statistic  C„.  The  paper  also  characterizes  the  limit  distribution  of  related  statistics  used 
to  determine  the  probabihty  of  false  coverage  (probability  of  covering  larger  sets  than  0/). 
(This  in  turn  characterizes  the  power  properties  of  the  testing  procedure  implicitly  defined 
by  the  confidence  region.)  The  non-equicontinuous  behavior  of  the  empirical  process  6  1-^ 
o-nQn{G)  in  e.g.  Examples  1-3  poses  a  challenge  to  this  analysis,  which  is  addressed  through 
a  generalization  of  epi-convergence  and  stochastic  equi-semi-continuity  (Knight  1999)  to  the 
set-identified  case.  The  "parameter-on-the-boundary"  problem  is  another  challenge  in  this 
analysis;  e.g.  it  arises  in  Example  4,  where  the  identified  set  0/,  defined  as  an  intersection  of 
a  hyperplane  with  0,  generally  has  common  points  with  the  boundary  of  0,  defined  relative 
to  H''.  This  challenge  is  addressed  through  an  appropriate  generalization  of  the  "Chernoff 
regularity"-  the  condition  that  in  the  point-identified  case  requires  convergence  of  the  local 
parameter  space  to  a  cone  (Chernoff  1954,  Andrews  2001).  The  generalization  requires  a 
convergence  of  a  graph  of  the  local  parameter  space  to  an  appropriate  limit  graph. 

3.  General  Estimation  and  Inference  in  Large  Samples 

This  section  defines  the  estimators  and  confidence  regions  formally  and  develops  the  basic 

results  on  consistency,  rates  of  convergence,  and  coverage  properties  of  these  regions.    The 
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section  develops  general  conditions  that  parallel  those  used  in  extremum  estimation  in  point- 
identified  cases  (Amemiya  1985,  Newey  and  McFadden  1994,  van  der  Vaart  1998).  Section  4 
illustrates  and  verifies  these  conditions  for  the  moment  condition  models. 

3.1.  Basic  Setup.  The  parameter  space  0  is  a  non-empty  compact  subset  of  11*^  equipped 
with  a  subspace  topology  relative  to  IR  .  Data  w-i,...,Wn  are  a  random  vector  defined  on  a 
complete  probability  space  {Q.,J^,P).  Suppose  that  the  sample  criterion  function  Qn{d)  = 
Qn{0,w\, ...,  Wn)  is  available,  and  that  Q^  converges  uniformly  to  a  continuous  criterion  func- 
tion Q  >  0  that  attains  minimal  value  0  on  0/.  The  contour  sets  of  Qn  will  be  used  for 
estimation  and  inference  on  0/.  This  approach  therefore  employs  the  classical  duality  princi- 
ple of  inverting  a  likelihood  type  test  statistic  to  obtain  confidence  regions. 

Regarding  notations  used  in  the  paper,  e-expansion  of  0/  in  0  is  defined  as  0}  :=  {6  e 
0  :  d{9,Qi)  <  e}.  Unless  an  ambiguity  arises,  sup^  /  is  used  to  denote  supa^^f{a).  The 
notions  of  stochastic  convergence,  e.g.  convergence  in  (outer)  probability,  denoted  as  -^p,  and 
stochastic  order  symbols.  Op  and  Op  are  defined  with  respect  to  the  outer  probability  P*,  as  in 
van  der  Vaart  and  Wellner  (1996);  wp  — >  1  stands  for  "with  (inner)  probability  approaching 
1."  For  any  two  numbers  a  and  b,  a  /\b  denotes  min(a,  b),  and  a  V  6  denotes  max(a,  b).  For 
convenience,  Appendix  A  collects  other  definitions  and  notations. 

3.2.  Consistency  and  Rates  of  Convergence  in  The  General  Cases.  Let  the  following 
assumption  hold. 

Condition  C.l  (Uniform  Convergence  and  Continuity),  (a)  0  is  a  non-empty  compact  subset 
ofJR'^,  (bj  Q  :  Q  t-^  1R_|_  is  continuous  and  mine  Q  =  0;  let  0/  :=  argmineQ,  (c)  Qnid)  = 
Qn{d,  wi, ...,  Wn)  takes  values  in  1R+  and  is  jointly  measurable  in  the  parameter  9  and  the  data 
wi,  ...,Wn  defined  on  a  complete  probability  space  (Q,  J^,  P),  (d)  sup©  \Qn  —  Q\  =  Op{l/bn)  for 
a  sequence  of  constants  6„  — >  oo  ,  and  (e)  supg^  Qn  =  Op(l/a„)  for  a  sequence  of  constants 
an  -^  oo. 

Condition  C.l  assumes  uniform  convergence  for  the  criterion  function  Qn  to  the  limit  Q. 
It  also  identifies  0/  as  the  minimizer  of  the  limit  criterion  function  Q.  The  assumptions 
Q  >  0  and  Qn  >  0  are  not  restrictive.^    In  C.l(c),  measurability  is  assumed  to  hold  with 


^Given  a  function  (5„  :  ©  — >  K  and  its  continuous  uniform  limit  Q  :  0  — >  IR,  we  can  define  Qn{6) 
Qn{0)  -  infs'ge  Qn{9')  and  Q{9)  =  Q{6)  -  infg'ge  Q(^')  to  reach  this  assumption. 
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respect  to  the  product  of  the  sigma-field  T  and  the  Borel  sigma-field  of  0.  C.l(c)  guarantees 
measurabihty  of  supe^  (5„  and  related  statistics;  see  e.g.  van  der  Vaart  and  Wellner  (1996), 
p.47.« 

Condition  C.l  also  defines  the  principal  quantity  supg^  Q^  which  plays  the  crucial  role  in 
the  analysis  of  inference.  Its  rate  of  convergence  to  zero  a„  plays  the  key  role  in  the  analysis  of 
consistency  and  rates  of  convergence.  Section  4  shows  that  in  the  models  of  Section  2  a„  =  n. 

The  contour  sets  of  a^Qn  form  the  class  of  estimates  we  consider.  The  level  c  contour  set 
of  anQn  is  defined  as 

C„(c):={^ee:a„(5„(^)<c},  (3.1) 

where  c  >  0.  Next  let  c"be  a  sequence  of  non-negative  random  real  variables  such  that  c  ^p  oo 
slowly  so  that  'cjan  — >p  0.  For  instance,  when  a„  =  n,  we  can  set  c  =  Inn.  We  will  discuss 
the  choice  of  c  further  when  we  consider  inference.  The  estimates  and  confidence  regions  will 
generally  take  the  form  0/  :=  Cni^. 

A  condition  that  determines  the  rate  of  convergence  of  Cn{c)  to  07  is  the  following. 

Condition  C.2  (Existence  of  a  Polynomial  Minorant).  There  exist  positive  constants  {5,  k,  7) 
such  that  for  any  e  e  (0,1)  there  are  [K^,n^)  such  that  for  all  n>n^, 

Qn{e)>K-[d{e,ei)A5Y< 

uniformly  on  {9  £  Q  :  d(9,  0/)  >  n^/an    },^  with  probability  at  least  1  —  e. 

Condition  C.2  states  that  Qn  can  be  stochastically  bounded  below  by  a  polynomial  over 
a  neighborhood  of  0/.  C.2  parallels  the  conditions  used  to  derive  the  rate  of  convergence  of 
estimators  in  the  point-identified  cases. 

Theorem  3.1  (Coverage,  Consistency,  and  Rates  of  Convergence  of  Cnic)).  Let  0/  =  Cn{c), 
where  c^p  00  such  that  cja^  -^p  0.  Suppose  that  0/  7^  0,  then,  (1)  C.l  implies  that  Qj  C  0/ 
wp  ->  1  and  dniQi,  0/)  =  Op{l),  and  (2)  C.l  and  C.2  imply  that  d//(0/,  0/)  =  Op{{c/ar,yl'^). 
Suppose  that  0/  =  0,  then  (3)  C.l  implies  that  dniQijQi)  =  0  wp  ^  I. 


The  condition  of  joint  measurability  is  only  needed  to  simplify  exposition,  following  a  suggestion  of  a 
referee.  Otherwise,  we  can  easily  drop  this  condition,  since  we  allow  for  stochastic  convergence  in  the  sense  of 
Hoffmann-Jorgensen.  In  this  case,  under  the  other  assumptions  stated,  the  primary  statistics  are  asymptoti- 
cally measurable. 

When  Qi  =  0,  this  set  is  empty,  in  which  case  C.2  does  not  apply. 
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Parts  (1)  and  (2)  address  the  case  of  the  partial  identification,  when  0/  7^  9.  In  some 
sense,  this  is  the  typical  case  for  applications,  and  thus  our  interest  lies  primarily  in  this  case. 
The  consistency  results  (1)  and  (2),  stated  in  terms  of  the  Hausdorff  metric,  generalize  those 
obtained  for  point-identified  cases,  see  e.g.  van  der  Vaart  (1998).  Both  the  consistency  and 
rate  results  are  new  for  the  problem  studied  in  this  paper. ^°  Section  4  shows  that  in  the 
moment-condition  models  of  Section  2,  Q„  is  locally  quadratic,  i.e.  7  =  2,  and  a„  =  n.  It 
follows  by  Theorem  3.1  that  the  convergence  rate  can  be  made  arbitrarily  close  to  1/ ^/n}^ 

Part  (3)  addresses  the  case  of  the  complete  non-identification,  when  0/  =  0,  in  which  case 
the  estimator  converges  to  0/  in  the  Hausdorff  metric  faster  than  any  rate.  This  case  is  not 
of  prime  interest,  and  is  stated  for  completeness. 

Example  1  (contd.)  In  Example  1,  recall  that  Q„(6I)  =  {En[Yi\  -  9)\  +  {Er,[Y2\  -  9f_ 
and  Q{e)  =  {Ep[Y,\  -  6)1  +  {Ep[Y2\  -  Of^.  Suppose  {^{En[Y,]  -  Ep[Y,]),  ^{E,,[Y2]  - 
Ep[Y2])y  -d  {Wi.W^y  -  N{0,  Q).  Then  supe  \Qn  -  Q\  =  0^(1/^.)  while  supe^  |Q„  -  Q\  = 
Op(l/n),  so  that  6„  =  ^/n  and  a„  =  n.  By  Theorem  3.1  C„(lnn)  consistently  estimates 
0/  =  [E'pfyi],  £'p[y2]].    Further,  it  is  simple  to  verify  that  C.2  holds  with  7  =  2.    Hence  by 


Theorem  3.1  the  set  C„(lnn)  is  consistent  exactly  at  y'lnn/n  rate.  Note,  however,  that  the 
set.C„(0)  =  [i?„[yi],  £'„[l2]]  consistently  estimates  [£'p[yi],£'p[y2]]  at  l/\/n  rate.  Hence,  in 
this  example  and  many  others,  but  not  all,  it  is  possible  to  achieve  the  rate  l/a-n  exactly. 
Section  3.3  below  develops  this  point  further. 

Renicirk  3.1.  (A  counter-example)  The  following  example  shows  why  it  is  not  possible  to 
achieve  the  sharp  rate  of  convergence  \/aJ"'  by  setting  c=  Op{\)  in  all  cases.  Setting  c  =  Op(l) 
may  in  general  lead  to  inconsistency.  Consider  the  following  trivial  example  that  illustrates 
the  source  of  the  inconsistency.  Let  0  =  [0,3],  Q{9)  =  0  for  each  9  E  [0,2]  and  Q{9)  =  1  for 
each  d  G  (2,3],  so  that  0/  =  [0,2];  Qn{9)  =  xVn  for  6  G  [0,1],  Q„(0)  =  0  for  9  €  (1,2],  and 
Qn{0)  =  1  for  0  e  (2,3],  where  x^  is  a  chi-square  variable.  Then  Theorem  3.1(1)  applies  with 
a„  =  n  to  claim  that  C„(lnn)  is  consistent.  However,  Cn(c)  for  a  fixed  c  >  0  is  not  consistent, 
since  (ii/(C„(c),  0/)  =  d//([l,  2],  0/)  =  1  with  the  asymptotic  probabihty  Pr{x^  >  c)  >  0, 


■"^The  consistency  result  differs  from  an  earlier  result  by  Manski  and  Tamer  (2002)  that  derives  consistency 
of  the  set  {6  £  Q  :  Qj,  {9)  <  c/b„  }  where  hn  =  \fn  in  regular  cases.  We  in  fact  show  consistency  of  smaller  sets 
replacing  6„  with  a„  »  hn,  where  a„  =  n  in  regular  cases.  More  generally,  6„  and  a„  are  defined  by  C.l(d,e). 

^^In  other  examples  like  the  ones  considered  in  Kim  and  Pollard  (1990),  a„  =  r?!'^  and  7  =  2,  giving  the 
rate  of  convergence  v}!'^ . 
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while  dH{Cn{c),Qi)  —  c?7^([0,  2],  9/)  =  0  with  the  asymptotic  probabihty  Pr{-)^  <  c)  <  1. 
Therefore,  with  a  positive  probabihty  the  set  Cn[c)  does  not  cover  substantial  portions  of 
the  set  0/,  and  the  Hausdorff  distance  between  Cn{c)  and  0/  does  not  converge  to  0.  The 
inconsistency  of  this  kind  extends  to  more  general  cases  such  as  Example  4.  Thus,  Z-^^p  oo  is 
needed  to  achieve  consistency  generally. 

3.3.  Consistency  and  Rates  of  Convergence  with  "Degenerate  Interior".  In  many 
moment  inequality  problems,  the  exact  rate  of  convergence  l/a„  can  be  attained  by  setting 
c  =  Op(l)  or  even  c  =  0.  The  discussion  of  Example  1  above  provides  the  simplest  instance 
where  this  is  possible.  The  reason  is  that  in  moment-inequahty  problems,  criterion  function 
Qn  can  have  degenerate  asymptotics,  i.e.  vanish,  over  subsets  of  the  identified  set  0/  that 
can  approximate  0/.  Consistency  and  rate  results  then  follow,  because  C„(c)  includes  these 
subsets  even  when  c  =  0. 

In  order  to  discuss  this  property  formally,  consider  the  following  condition: 

Condition  C.3  (Degeneracy).  There  exists  a  constant  77  >  0  and  a  collection  of  subsets 
{07',  e  e  [0,ri]}  ofQj  such  that  (a)  dH{Qj',Qi)  <  e  for  all  e  G  [0,77],  (h)  for  any  e  G  [0,7?], 
there  is  n^  such  that  for  all  n  >  n^,  P{suPq-£  anQn  =  0}  =  1,  (c)  there  exists  7  >  0  such 
that  for  any  e  >  0  there  are  constants  {k^^u^)  such  that  for  all  n  >  n^  F{sup  _^  -i/-,  anQn  = 
0}>l-£. 

In  the  remainder  of  the  paper,  we  take  07^  to  be  an  e-contraction  of  the  set  0/,  that  is 

Qj':={deej:d{e,e\ei)>e},  (3.2) 

where  e  >  0,'^^  although,  in  principle.  Condition  C.3  does  not  require  the  sets  07"^  to  be 
e-contr actions  of  0/.  C.3(a-b)  typically  arises  in  the  moment  inequality  models  due  to  all 
finite-sample  moment  inequalities  satisfied  on  e-contractions  07^  with  probability  converging 
to  1,  which  makes  the  criterion  function  vanish  on  07^  Condition  C.3(c)  further  puts  a 
rate  assumption  on  exactly  how  this  happens.  Section  4  verifies  this  condition  in  our  main 
applications. 

Theorem  3.2  (Consistency  and  Rates  of  Convergence  of  C„(c)  with  Degenerate  Interiors).  Let 
0/  denote  Cn{Z)  where  c'  >  0  with  probability  1  and  c'  — >p  c  >  0.   Then,  (1)  C.2  and  C.3(a,b) 


Note  that  this  set  is  always  well-defined,  although  it  may  be  an  empty  set. 
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imply  that  d//(6/,0/)  =  Op{l),  and  (2)  C.2  and  C.3  imply  that  dH(©/,©/)  =  Op{l/al/''). 
Moreover,  if  Qj  =  Q  and  sup@^  cinQn  =  0  wp  —^  1,  then  (3)  dniOi,  Qi)  =  0  wp  ^>-  1. 

Parts  (1)  and  (2)  contain  the  results  of  primary  interest,  which  state  that  if  C.3  holds,  the 
rate  l/on  is  achieved  exactly.  In  particular,  the  smallest  contour  set,  C„(0),  is  consistent  and 
converges  to  0;  at  the  rate  l/a„  .  Section  4  shows  that  in  many  moment  inequality  examples, 
C.3(b)  and  C.3(c)  hold  with  7  =  2  and  a„  =  n,  yielding  the  rate  of  convergence  l/y/n.  Part 
(3)  addresses  the  less  typical  case  of  the  complete  non-identification,  0/  =  0,  and  degenerate 
behavior  of  UnQn  on  0/;  in  which  case  the  estimator  converges  to  0/  in  the  Hausdorff  metric 
faster  than  any  rate.  This  case  is  not  of  prime  interest;  we  state  it  for  completeness. 

Example  1  (contd.)  To  clarify  the  role  of  C.3,  recall  Example  1.  Clearly,  for  a  sufficiently 
small  e  >  0,  07'  =  [Ep[Yi]  +  e,Ep[Y2]  -  e]  can  approximate  0/  =  [£;p[yi],  E'pfyo]]  in  the 
Hausdorff  metric,  provided  0/  is  not  a  singleton.  Since  Q„(0)  =  (£'„[Yi]  —  0)\  +  {En[Y2]  —  ^)i, 
with  probability  converging  to  1,  Qn  =  0  on  ©7*^.  Further,  for  any  £  >  0,  a  constant  k^  can 
be  found  such  that  Qn  =  0  on  07';=' v"  _  [^'^jy^j  _|_  K^/^^Ep[Y2]  —  K^/y/n\  with  probabihty 
at  least  1  —  e  in  large  samples.  Thus,  Cn{c)  is  consistent  at  rate  l/\/n  in  this  example. 

3.4.  Confidence  Regions.  The  question  that  arises  next  is  how  to  choose  c  to  guarantee  that 
Cn{c)  has  a  confidence  region  property.  The  inferential  properties  of  sets  Cn{c)  are  determined 
by  the  statistic 

Cn  =   sup  anQniO).  (3.3) 

Indeed,  Lemma  3.1  below  shows  that  event  {C„  <  c}  is  equivalent  to  event  {0/  C  Cn{c)}. 
If  quantiles  of  C„  or  good  upper  bounds  on  them  are  known,  finite-sample  inference  can  be 
conducted.  ^^  This  paper  provides  asymptotic  estimates  of  quantiles  of  C„,  using  either  a 
generic  subsampling  method,  developed  in  Section  3.5,  or  the  asymptotic  limits  for  C„  in  the 
moment  condition  problems,  developed  in  Section  4. 
The  following  basic  condition  is  required  to  hold. 

Condition  C.4  (Convergence  of  C„).  Suppose  that  P{Cn  <  c}  ^  P{C  <  c}  for  each  c  E 
[0,oo),  where  the  distribution  function  of  C  is  nan- degenerate  and  continuous  on  [0,  oo). 


^^For  instance,  the  upper  bounds  on  quantiles  can  be  obtained  using  the  maximal  inequalities  for  empirical 
processes. 
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Section  4  verifies  C.4  for  moment  condition  models. 

Lemma  3.1  (Basic  Large  Sample  Confidence  Regions).  (1)  Under  C.l  event  {C„  <  c] 
is  equivalent  to  event  {Cn{c)  covers  0/}.  (2)  Suppose  that  C.4  holds.  Then  for  any  c  — >p 
c{a)  :=  inf{c  >  0  :  P{C  <  c}  >  a}  for  a  £  (0, 1),  such  that  c  >  0  with  probability  1,  we 
have  that  lim„  P{0/  C  Cn{c)}  =  lim„P{C„  <  c}  =  P{C  <  c{a)}  =  a  if  c{a)  >  0,  and 
liminf„  P{Qi  C  C„(c)}  =  liminf„  P{C„  <c}>  P{C  =  0}  >  a  if  c{a)  =  0. 

3.5.  Generic  Estimation  of  the  Critical  Value  based  on  Subsampling.  This  section 
develops  a  generic  subsampling  method  for  consistent  estimation  of  the  critical  value.  The 
method  estimates  the  quantiles  of  C„  using  many  data  subsamples  of  size  b.  The  following 
condition  facilitates  the  construction. 

Condition  C.  5  (Approximability  of  C„).  For  C„(5„)  :=  sup  1/7  anQn,  we  have  that 
P[Cn{5n)  <  c]  =  P[C  <  c]  +  0(1)  for  any  (5„  [  0  and  any  c  >  0.  //  C.3  holds,  in  addi- 
tion require  that  this  condition  holds  for  any  (5„  t  0. 

Section  4  verifies  Condition  C.5  for  models  of  Section  2.  C.5  implies  that  it  suffices  to  apply 
subsampling  to  a  feasible  statistic  sup^'^/^-)  a^Qb  in  place  of  the  infeasible  statistic  supg,^  abQb- 

Generic  Subsampling  Algorithm.  At  a  preliminary  stage,  for  cases  when  data  {Wt}  is 
i.i.d.  sequence,  consider  all  subsets  of  size  6  ■C  n.^^  Denote  the  number  of  subsets  by  5„.  For 
cases  when  {Wt}  is  a  stationary  strongly  mixing  time  series,  construct  i?„  =  n  —  b+1  subsets  of 
size  b  of  the  form  {Wj, ...,  Wj+b-i}.  The  algorithm  has  four  steps:  (1)  Initialize  some  starting 
value  Co,  which  can  be  data-dependent,  such  that  cq  -^p  cq  >  0.  Set  k^  =  In  n.  If  C.3  is  known 
to  hold,  we  can  also  set  Cq  =  0  and  /t„  =  0.  (2)  Compute  Cj  as  the  a-quantile  of  the  sample 
{Cj,b,n  :=  suPe€Cr,{c)  ^bQj,b,n{^),  j  =  l,...,5n},  usiug  c  =  Cq  +  «;„,  where  Qj^b,n  denotes  the 
criterion  function  evaluated  using  j-th  subsample.  (3)  (Optional/ Asymptotically  Equivalent 
Iterations)  Repeat  Step  2  for  ^  =  2,  ...,L  by  computing  q  from  Step  2  using  c  =  q_i  +  k„. 
(4)  Report  Cn{cL  +  k„)  as  a  consistent  estimator  and  a  confidence  region.  Report  Cn{cL)  as 
a  confidence  region.  (The  latter  region  may  be  inconsistent  as  an  estimator,  if  C.3  does  not 
hold). 


In  applications,  since  number  of  such  subsets  is  large,  it  suffices  to  consider  a  smaller  number,  5„,  of 
randomly  chosen  subsets  of  size  b  such  that  5„  — >  00  as  n  — »  00. 
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Remark  3.2.  Chernozhukov,  Hong,  and  Tamer  (2002)  discuss  implementation  and  compu- 
tation in  further  detail.  Using  Example  2  as  the  basis  for  simulations,  they  find  that  a  small 
number  of  iterations,  L  ~  1  or  2,  setting  cq  to  a-quantile  of  a  y^  variable  with  degrees  of 
freedom  equal  to  the  number  of  moment  equations,  and  using  h  =  300  and  b  —  400,  led  to 
good  coverage  and  estimation  results  for  samples  of  size  1000  and  2000.  Politis,  Romano,  and 
Wolf  (1999)  describe  calibration  methods  for  choosing  b  in  practice. 

Theorem  3.3  (General  Validity  of  Subsampling).  Suppose  (a)  {Wi, ...,  W„}  is  either  i.i.d.  or 
a  stationary  and  strongly  mixing  series,  (b)  b  -^  cxd,  b/n  —^  0  at  polynomial  rates  as  n  ^f  oo, 
and  (c)  On  — >■  oo  ai  least  at  a  polynomial  rate  in  n.  Suppose  C.l,  C.2,  C.4,  and  C.5  hold.  Let 
a  E  (0, 1)  denote  the  desired  coverage  level.  Then,  for  any  finite  iteration  L  of  the  algorithm 
described  above,  the  following  is  true:  (1)  ci  -^p  c{a.)  :=  inf{c  >  0  ;  P{C  <  c}  >  a),  (2) 
lim„P{e/  C  Cn{c)]  =  a  if  c{a)  >  0,  and  liminf„P{e/  C  Cn{c)}  >  a  if  c{a)  =  0. 

Therefore,  any  finite  iteration  of  the  algorithm  produces  consistent  estimates  of  c{a).  The 
iterations  are  asymptotically  equivalent  and  thus,  for  the  purposes  of  asymptotics,  form  a 
single  step  procedure.  The  resulting  regions  Cnici)  cover  0/  with  P-probability  a  in  large 
samples.  Further,  in  order  to  get  confidence  regions  that  also  consistently  estimate  9/,  we 
should  expand  them,  namely  take  Cn{cL  +  i^n)  for  «„  defined  above.  (When  C.3  is  known  to 
hold,  we  do  not  need  to  expand  them,  so  we  can  set  «;„  =  0.) 

Remark  3.3.  It  follows  from  the  proof  of  Theorem  3.3  that  Conditions  C.l  and  C.4  alone 
suffice  for  C„(cl)  to  cover  0/  with  P-probability  at  least  a. 

Remark  3.4.  If  the  researcher  does  not  know  whether  C.3  holds,  he  can  still  use  the  algorithm 
with  the  expansion  constant  «;„  =  In  n. 

Remark  3.5  (Variations).  Recently  Sheikh  (2006)  proposed  a  step-down  variant  of  our  al- 
gorithm, which  is  numerically  equivalent  to  our  algorithm,  except  that  it  employs  the  choice 
of  constants  Cq  oc  a„  and  «;„  =  0,  where  the  very  conservative  choice  cq  cx  a„  is  used  to  avoid 
estimation  of  the  set  0/.  The  finitely- iterated  step-down  algorithm  is  typically  more  con- 
servative (hence  less  powerful)  than  the  original  procedure.  The  infinitely-iterated  step-down 
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algorithm  has  the  same  asymptotic  properties  as  our  algorithm,  but  it  is  more  computationally 
expensive.  ^^ 

3.6.  Asymptotics  of  C„  and  Related  Inferential  Statistics.  This  section  develops  meth- 
ods for  obtaining  the  limits  of  C„  and  related  inferential  statistics  that  determine  the  prob- 
abilities of  false  coverage.  Such  a  task  faces  two  major  difficulties:  one  is  the  failure  of  the 
usual  stochastic  equicontinuity  conditions  of  the  underlying  empirical  process  and  another  is 
the  parameter  on  the  boundary  problem,  as  defined  below.  This  section  outlines  a  frame- 
work for  obtaining  these  limits  by  relying  on  concepts  of  stochastic  equi-semi-continuity  and 
generalizations  of  Chernoff  (1954)  type  conditions  on  the  parameter  space  O. 

Consider  the  statistic  C„(5)  :=  sup       1/7  a„Qn,  where  9/""     is  5/ a„    -expansion  of  9/. 

0J-     " 

Since  C„  =  C„(0),  C„  is  a  special  case  of  this  statistic.  Suppose  that  for  each  ^  >  0 

Cn{S)  -^d  C{5)  in  IR.  (3.4) 

Relation  (3.4)  implies  that  the  probability  that  the  confidence  region  for  9/  covers  false  local 

(5/   ■'''•' 

region  9/""^     satisfies 


p|0j/a.  ■  ^  Cn{c{a))}  =  P{  sup    anQn  <  c{a)}  -^  P{C{5)  <  c{a)},  .3  5>^ 


^'''  C  Cr.[c{a))}  =  P{ 

as  long  as  c{a)  is  the  continuity  point  of  the  distribution  oiC{5).  Then  asymptotic  probability 
of  false  coverage  satisfies  P{C{5)  <  c{a)}  <  P{C  <  c{a)},  with  strict  inequality  holding  if 
the  distribution  function  of  C{5)  differs  from  the  distribution  function  of  C  at  c{a).  From  a 
testing  prospective,  we  can  view  9/  "  as  a  local  alternative  to  9/,  so  that  statements  about 
false  coverage  translate  in  an  obvious  way  to  statements  about  local  power. 


The  starting  value  Cq  oc  a„  in  tlie  step-down  algorithm  is  very  conservative.  Iteration  of  the  algorithm 
reduces  tliis  critical  value;  in  the  limit  of  iteration,  the  critical  value  is  essentiallj'  the  same  as  the  critical  value 
c{a)  +  Op(l)  produced  by  our  algorithm.  Clearly,  when  the  number  of  iterations  is  insufficient,  the  finitely- 
iterated  step-down  algorithm  provides  confidence  regions  that  are  more  conservative  than  our  confidence 
regions.  Thus,  in  practice  our  confidence  regions  are  often  smaller  than  the  finitely-iterated  step-down  regions. 
More  formally,  when  C.3  holds,  starting  with  cq  =  0  and  «„  =  0,  our  algorithm  produces  the  asymptotically 
valid  critical  value  Ci  =  c{a)  +Op{l)  in  merely  a  single  iteration,  and  cj  is  less  than  the  critical  value  produced 
by  the  step-down  variant  in  any  finite  number  of  iterations.  When  C.3  does  not  hold,  the  infinitely-iterated 
step-down  procedure  which  aggressively  sets  «„  =  0  asymptotically  agrees  with  our  critical  value  c{a)  +  Op(l) 
with  probability  at  least  a.  However,  conditional  on  the  event  that  the  step-down  region  does  not  cover  0/, 
which  occurs  with  probability  at  most  1  —  a,  the  step-down  critical  values  may  be  smaller  than  c{a)  +  Op(l). 
Thus,  since  the  discrepancy  between  step-down  and  our  regions  occurs  only  when  Type  I  error  does,  the 
discrepancy  is  irrelevant. 
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The  analysis  focuses  on  the  asymptotic  behavior  of  the  empirical  process: 

in{e,\):=anQr,{e  +  \/a]l'^),    (^,A)eV;^  (3.6) 

where 

Kf  :=  {(^,  A)  :  ^  e  07,  A  €  ^f  (0)},  Vl{B)  :=  ay^(e  -9)0  Bs,  (3.7) 

where  Bs  denotes  the  closed  ball  in  IR'^  of  diameter  5  centered  at  the  origin  and  an''{Q  —  6) 
denotes  the  parameter  space  translated  by  9  and  multiplied  by  the  scahng  rate  a]/'' }^  The 
parameter  A  represents  the  local  deviation  from  9  and  ranges  over  the  local  parameter  space 
V^{9).  The  parameter  9  ranges  over  the  identified  set  9/.  The  inferential  statistics  in  (3.4) 
are  suprema  of  the  empirical  process  (8.4)  over  the  set  (3.7): 

Cr,{5)=     sup    4(^,A). 

The  limit  properties  of  C„(^)  will  therefore  depend  on  the  limit  properties  of  V^.  Observe 
that  V^  is  the  graph  of  the  correspondence  9  =t  Kf(^),  defined  over  domain  6/,  and  let  V'^ 
denote  the  graph  of  some  other  correspondence  6  ^  V^iO),  also  defined  over  domain  0j. 
The  condition  below  requires  that  V^  "converges"  to  V^,  where  the  notion  of  convergence  is 
motivated  statistically. 

Condition  S.l  (Generalized  Chernoff  Regularity).  (A)  Qn  is  defined  on  a  neighborhood  Q' 
of  0  m  IR  ,  and  is  jointly  measurable  in  6  ^  Q'  and  data  Wi,...,Wn  defined  on  a  complete 
probability  space  {^,J-,P).    (B)  For  any  e  >  0  and  5  >  0  there  exists  n^  such  that  for  all 

n  >  n^,  P{\  SXVpys  (.n  —  SUPv'i  ^n|  >  f}  <  £■ 

In  S.l  (A)  is  needed  to  make  sure  that  £„  is  well  defined  over  0/  x  Bs  for  large  n  and  hence 
over  V^.  S.l(B)  is  obviously  satisfied  when  5  =  0,  a  case  which  is  relevant  for  asymptotics  of 
C„  =  C„(0),  since  in  this  case  Y^  =  V^  =  0;  x  {0}. 

Next,  suppose  there  exists  5  >  0  such  that  Bs{9)  C  0  for  each  9  E  Qj,  where  Bs{0)  is  a 
closed  ball  in  IR'^  of  radius  6  centered  at  9.  Then, 

V^  =  V^  =  Qi  X  Bs  for  all  sufficiently  large  n,  (3.8) 

and  S.l(B)  also  holds  trivially.  This  case  will  be  called  the  parameter  in  the  interior  case.  It 
appears  to  be  reasonable  in  many  applications,  where  e.g.  0  is  a  rectangle  or  a  convex  body 


^^That  is  0  -  61  is  Minkowski  difference  of  set  9  and  set  {6'}. 
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in  IR'^  and  9/  is  in  the  interior  of  0.  The  case  where  the  parameter  in  the  interior  condition 
fails  to  hold  will  be  called  the  parameter  on  the  boundary  case.  This  definition  extends  the 
definition  of  Chernoff  (1954)  and  Andrews  (1999)  to  the  present  context.  In  this  case  the  limit 
graph  V^  will  have  a  form  that  depends  on  the  structure  of  ©. 

The  parameter  on  the  boundary  case  arises  in  many  problems.  One  example  is  the  linear 
instrumental  variable  model  (Example  4)  with  0  given  by  a  rectangular  region.  There,  iden- 
tified set  0/  and  the  boundary  of  0  necessarily  have  common  points,  so  that  (3.8)  does  not 
hold.  Another  example  is  the  case  where  0  is  itself  defined  by  a  manifold,  so  that  (3.8)  does 
not  hold.  Lemmas  4.1  and  4.2  in  Section  4  derive  V^  for  the  moment  condition  models  of 
Section  2,  covering  the  cases  in  which  the  parameter  on  the  boundary  problem  does  occur. 

Condition  S.2.  (Weak  Sup-Convergence)  For  any  finite  set  A  C  [0,  oo),  (sup^^^  in,  5  E  A)  — s-^ 
(sup^,^  ^tx),'^  £  A)     in  IR'    ',  where  {9,X)  >—>■  (^oo{d,\)  is  a  non-negative  stochastic  process}"^ 

The  process  i^o  will  be  referred  to  as  the  sup  limit  of  £„.  The  sup  convergence  is  more 
general  than  uniform  convergence,  namely  the  convergence  in  L°°[Qj  x  B5),  which  is  implied 
by  finite-dimensional  convergence  and  stochastic  equicontinuity  of  in-^^  In  particular,  the 
uniform  convergence  fails  in  the  moment  inequahty  model,  while  the  sup  convergence  does 
not;  see,  for  instance,  discussion  of  Example  1  below.  The  following  condition  is  helpful  in 
verifying  sup  convergence. 

Condition  S.3.  (Weak  Finite-Dimensional  Convergence  and  Approximahility) 

A.  (Fidi  Convergence)  For  any  5  >  Q  and  any  finite  subset  M  ofV^,  [in{Q,  A),  {9,  A)  €  M)  -^^ 
(^oo(^)  ^)-,  (^)  •^)  G  ^)  in^^    ',  where  {9,  A)  i-^  ^oo(^,  ^)  is  a  non-negative  stochastic  process. 

B.  (Fidi  Approximahility)  For  any  e  >  0  and  5  >  0  there  is  a  finite  subset  M[e)  of  V^  such 
that  for  all  n  G  [n^,  00]  .•  Pjsupj^i  In  —  "^^^Mie)  C  >  £^}  <  £■ 

These  conditions  imply  that  the  finite-dimensional  limit  and  the  sup-limit  coincide.  Oth- 
erwise, the  two  limits  may  disagree  in  general.     The  finite-dimensional  approximabihty  is 


■^'''Here  |^|  denotes  cardinalitj'  of  the  finite  set  A. 

■'^This  notion  of  sup  convergence  could  be  modified  to  yield  what  is  known  as  weak  hypo-convergence, 
which  may  then  be  used  to  study  the  convergence  of  hypo-graphs  of  ^„  to  those  of. £00  as  random  closed  sets, 
e.g.  extending  the  approach  in  Molchanov  (2005)  for  the  present  problem.  However,  the  weak  convergence  of 
hypo-graphs  is  not  of  interest  per  se  in  this  paper. 
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motivated  and  extends  the  Knight's  (1999)  notion  of  stochastic  equi-semi-continuity  to  set- 
identified  models. 

Lemma  3.2.  (1)  Condition  S.l  and  S.2  imply  (3.4)  with  the  limit  variables  given  by  C{5)  — 
supyd  ioo,  'in  particular  C  =  C(0)  =  supg^g^  ^oo(^,0).  (2)  Condition  S.3  implies  condition  S.2. 

Example  1  (contd.)  In  Example  1,  Q„(0)  =  {En[Yi]  -  d)l  +  (E^fFs]  -  Of_.  Then 
4(^,A)  =  n{Er,[Yi]  -  e-  X/^)l  +  n{En[Y2]  -9  -  \/^f_.  Suppose  that  {y/^[E„[Y^\  - 
Ep\Y^]),  ^{Er,[Y2]  -  Ep[Y2\))'  -^d  [WuWn)'  =  N{0,n).  Then  the  finite-dimensional  limit  of 
4(6*,  A)  is  given  by 

i^{0,  A)  =  [W,  -  A)^l(^  =  Ep[Y,])  +  {W2  -  Xtl{9  =  Ep[Y2]). 

The  limit  is  not  continuous  in  0  at  ^  =  -EpfVi]  and  at  ^  =  jE'p[y2],  hence  ^„(^,A)  can 
not  be  stochastically  equicontinuous  and  uniform  convergence  fails.  Suppose  that  0/  = 
[Ep[Yi], Ep[Y2]]  is  in  the  interior  of  9.  Then  S.l  holds,  since  V^  =  V^  =  Gi  x  Bs  for 
all  sufficiently  large  n.  Also,  finite-dimensional  approximability  S.3(B)  can  be  easily  veri- 
fied. Therefore  S.2  holds,  so  that  the  finite-dimensional  limit  ^oo(^>A)  is  also  the  sup-limit  of 
£n{9,  A).  By  Lemma  3.2  we  have  that  C„((5)  —>-d  C{5)  —  supi^gx)^eixBs  ^oo{9,  A),  in  particular 

Cn^dC=  sup  e^{e,0)  =  max  ((VKi)^,  {W2t)  . 

4.  Analysis  of  Moment  Condition  Models 

4.1.  Moment  Equalities.  We  begin  the  discussion  with  moment  equalities.  Recall  the 
moment-equality  set-up  in  Section  2,  where  the  identification  region  takes  the  form  0/  = 
{^  G  0  :  Ep['mi{9)]  =  0].  Suppose  there  exist  positive  constants  C  and  5  such  that  for  all 
9  eQ 

||^pK(^)]||>C-(d(^,0/)A<^).  (4.1) 

This  is  a  partial  identification  condition,  which  states  that  once  9  is  bounded  away  from  0/, 
the  moment  equations  are  bounded  away  from  zero. 

In  the  point-identified  case,  the  full  rank  and  continuity  of  the  Jacobian  VeEp[mi{9)]  near 
0/  ordinarily  imply  (4.1).  In  the  set-identified  case,  the  Jacobian  may  be  degenerate,  which 
requires  a  more  involved  condition  (4.1).  For  example,  in  the  linear  IV  model  of  Example  4 
we  have  that  Ep[mi{9)]  =  Ep[ZX']{9  —  9*),  where  9*  is  the  closest  point  to  ^  in  0/.  Provided 
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that  \\6*  -  9\\  >  0,  vector  {6  -  6*)  is  orthogonal  to  the  hyperplane  {v  :  Ep[ZX']v  =  0}.  Hence 
if  rank  Ep[ZX']  is  non-zero,  we  have  \\Ep[ZX']{9  -e*)\\>C-  \\9  -9*\\,  where  C  is  the  square 
root  of  the  minimal  positive  eigenvalue  of  Ep[X Z']Ep[ZX']. 

The  main  stochastic  assumption  is  that  {mi{9),9  G  0'}  is  a  P-Donsker  class  of  functions, 
where  6'  is  a  neighborhood  of  0  in  IR'^.  By  this  we  mean  the  following:  (1)  In  the  metric 
space  L°°(0') 

Gn[mm  :=  ^^(E„K(^)]  -  Ep[m.,{e)]^  ^  A{9),  (4.2) 

where  A(^)  is  a  mean  zero  Gaussian  process  with  a.s.  continuous  paths  and  Varp[A{9)]  >  0 
for  each  9  G  0'.  (2)  The  probabihty  space  (r2,jF,  P)  is  rich  enough  (or  has  been  suitably 
augmented)-'^  so  that  there  exists  a  map  A„  :  Q  ^  L°°(0')  such  that  A„(^)  =ci  G„[m,;(6')]  and 
A„(^)  =  A(^)  +  Op{l)  in  L°°(0').  ^°  The  second  condition  does  not  involve  loss  of  generality 
due  to  the  Skorohod-Dudley-Wichura  Construction,  see  Theorem  1.10.4  in  van  der  Vaart  and 
Wellner  (1996)  or  Dudley  (1985).  Other  conditions  are  given  in  the  following  assumption. 

Condition  M.l.  Suppose  the  following  conditions  hold  for  the  moment  equality  m.odel  of 
Section  2:  (a)  Q  is  a  non-empty  compact  subset  o/lR  ,  and  the  real-valued  criterion  function 
Qn{6)  is  defined  on  a  neighborhood  0'  of  0  m  R,  ,  and  is  jointly  measurable  in  9  E  Q'  and 
data  Wi,  ...,Wn  defined  on  a  com.plete  probability  space  {fl.,J^,  P),  (b)  0  is  such  that  the  graph 
of  the  local  parameter  space  V^  converges  to  some  set  V^  in  Hausdorff  metric,  where  V^ 
is  non- decreasing  in  5  >  0,  (c)  {mi{6),9  G  0'}  satisfies  P-Donsker  condition  stated  above, 
(d)  Ep[mi{9)\  satisfies  partial  identification  condition  (4-i)  and  has  a  continuous  Jacobian 
G{9)  =  VeEp[mi{9)\  for  each  9  G  0',  and  (e)  Wn{9)  =  W{e)  +  Op(l)  uniformly  in  9  e  Q' 
where  W{9)  is  positive  definite  and  continuous  for  all  9  E  Q'. 

Most  of  these  assumptions  are  conventional.  We  needed  them  to  verify  C.l,  C.2,  C.4, 
C.5,  and  other  main  conditions.  Condition  M.l(b)  is  a  generalization  of  the  Chernoff  (1954) 
condition,  which  is  needed  for  the  analysis  of  false  coverage,  as  discussed  in  Section  3.6,  and 
for  the  second  part  of  Theorem  4.1  below.  'M.l(b)  also  holds  trivially  in  the  parameter  in  the 
interior  case,  as  defined  in  Section  3.6,  in  which  case  V^  =  0/  x  B^.  M.l(b)  can  be  replaced  by 


^^We  shall  use  {fl,J-,  P)  to  denote  the  augmented  probability  space. 

■^"Notation  =4  means  equality  in  law:  given  two  maps  X  and  Y  that  map  fi  to  a  metric  space  D,  X  =d  y 
if  Ep^\f[X)\  =  Ep-[f{Y)]  for  every  bounded  /  :  ID  >-+  M,  where  Ep-  denotes  outer  expectation  with  respect 
to  P,  see  van  der  Vaart  and  Wellner  (1996),  p.  60. 
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the  classical  assumption  that  0  is  convex,  in  which  case  V^  has  a  very  simple  form  stated  in 
Lemma  4.1.  The  convergence  imposed  in  M.l(b),  known  in  variational  analysis  as  a  graphical 
convergence  of  correspondences,  is  a  fairly  weak  notion  of  convergence  for  correspondences,  for 
instance  it  is  considerably  weaker  than  the  uniform  convergence  sup^^g^  d//(V"^(^),  K^(^))  = 
o(l)  (Rockafellar  and  Wets  1998).  Lemma  4.1  stated  below  provides  further  discussion  of 
M.l(b). 

Theorem  4.1  (Moment  Equations).  (1)  Conditions  M.l(a,c,d,e)  imply  C.l,  C.2,  C.4,  C.5 
with  7  =  2,  a„  =  n,  and  bn  =  y/n.  If  condition  M.l(h)  also  holds,  then  S.1-S.3  hold,  and  the 
sup-limit  ofin{9,X)  :=  nQ„(^  +  X/^/E)  is  given  by:  i^{9,X)  =  \\  {A{e)  +  Gie)X)' W^/\e)\\'^. 
In  particular, 

c  :=  sup ue,o)  =  sup  \\A{9yw'/'{e)\\'  u  3) 

where  A{9)  is  a  zero-mean  Gaussian  process  defined  in  (4-2). 

(2)  \¥hen  Qn{9)  =  Qn{d)  —  infe'ee  Qnl^')  is  used  for  inference,  condition  M.l  implies 
C.l,  C.2,  C.4,  and  C.5,  with  7  =  2,  a„  =  n,  6„  =  y/n,  and  the  sup-limit  0/ ^„(^,  A)  := 
nQn{9  +  X/s/n)  -  ninf^'ee  <3n(^')  is  given  by:  1^{9,X)  =  io^{9,X)  -  mi(^e\x')ev^  ^^{9' ,X'), 
where  V^  :—  lim^foo  ^cxj-  -^'^  particular, 

C  :=  sup  loo(^,0)  =  sup  \\A{9yW^'\9)\\''  -      inf      ||(A(^)  +  G{e)X)'W'!\9)\\\        u  4) 

The  most  basic  implications  are  that,  for  c  — >-p  c{a),  the  confidence  region  Cn{c)  has 
asymptotic  coverage  a  (it  need  not  be  consistent).  The  estimator  Cn{c  +  Inn)  is  consistent 
at  ^J\n  n/n  rate  with  respect  the  Hausdorff  distance,  and  has  asymptotic  coverage  of  1.  The 
sup-limit  £00  of  the  empirical  process  in  obtained  by  the  theorem  describes  the  limit  behavior 
of  related  inferential  statistics,  following  Section  3.6.  Lastly,  note  that  compactness  of  9/ 
insures  that  the  limit  variable  C  is  finite. 

The  quantiles  of  C  in  (4.3)  can  be  estimated  by  the  generic  subsampling  method  of  Section 
3.5  or  by  simulating  the  limit  distribution.  The  latter  method  is  generally  more  accurate  than 
subsampling. 

RerriEirk  4.1.  (Quantiles  of  (4.3)  by  Simulation)  For  instance,  if  the  data  are  i.i.d.,  we  can 
estimate  the  distribution  of  C  by  making  the  simulation  draws  of 

C:  :=  sup  C:{9),    C:{9)  :=  \\Al{9)'WS~m\ 
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where  A*(6')  =  n  ^^'^^'^^i['miiO)zi],  and  {zi,i  <  n)  is  a  n-vector  of  i.i.d.  A''(0, 1)  vari- 
ables. Note  that  A*(^)  is  a  zero-mean  Gaussian  process  in  L°°{Q)  with  covariance  function 
En[mi{e)mi{e'y].  Then  En[mi{0)mi{0'y]  =  Ep[m{e)m,{e'y]  +  Op(l)  uniformly  in  {9,6')  G 
9x0.  Thus  the  distance  between  the  law  of  A*  (6')  and  the  law  of  A(^),  in  the  weak  conver- 
gence metric,  converges  in  probability  to  zero.  Since  A*  (6')  is  stochastically  equicontinuous, 
dniQi,  Qi)  =  Op{l),  and  supg^Q  ||Ty„(6')  —  VK(0)||  =  Op(l),  the  distance  between  the  law  of  C* 
and  the  law  of  C,  in  the  weak  convergence  metric,  converges  in  probability  to  zero.  The  same 
argument  applies  if  the  distribution  of  A (6')  is  estimated  by  the  nonparametric  bootstrap  with 
recentering. 

Remark  4.2  (Quantiles  of  (4.4)  by  Simulation).  We  can  estimate  the  quantiles  of  C  in  (4.4) 
by  simulating  the  distribution  of  the  variable 

C:  :=  sup  Qie),    cm  :=  \\A:{9yW^/\9)r  -     inf     ||(A*(^)  +  G{9)XyW^/\9)f, 

where  G{9)  is  a  uniformly  consistent  estimate  of  \/eEp[mi{9)]. 

The  form  of  0  plays  an  important  role  as  it  determines  the  limit  form  of  local  parameter 
spaces  and  statistics  C„.((5)  which  behavior  determines  the  probability  of  false  coverage. 

'Lemma  4.1  (Chernoff  Regularity  for  Moment  Equations).  Sufficient  conditions  for  V^  to 
converge  in  the  Hausdorff  metric  to  some  set  V^,  which  is  non- decreasing  in  S  >  0,  include 
either  one  of  the  following:  (1)  Suppose  there  exists  5  >  0  such  that  Bs{9)  C  0  for  each 
9  gQi.  Then,  V^  ^V^^OjxBs  for  all  sufficiently  large  n.  (2)  Suppose  0  =  0g  nf^i  [9  e 
IR'^  :  gr{9)  =  0},  where  Qg  is  a  compact  and  convex  set,  Qt  \  Q'g  -^  IR'^  has  a  continuous 
Jacobian  Vgr{9)  with  a  constant  row  rank  over  Q'g,  a  neighborhood  of  Qg  in  WC^.  Then  the 
above  convergence  holds  with  V^  that  has  V^{9)  =  {X  E  Bs  :  X  £  \/rii{Qg  —  0)  for  some  n'  > 
l,V,5,(0)A  =  O,r  =  l,...,i?}. 

Lemma  4.1  provides  the  sufficient  condition  for  S.l  to  hold.  Case  (1)  is  the  parameter  in 

the  interior  case  that  may  arise  e.g.  when  0/  is  a  collection  of  isolated  points  that  lie  in  the 

interior  of  0  (defined  relative  to  W^),  in  which  case  V^  has  a  trivial  form.  Case  (2)  addresses 

the  parameter  on  the  boundary  case,  and  covers  the  convex  parameter  space  Qg  as  well  as  the 

parameter  space  generated  by  an  intersection  of  Qg  with  several  manifolds  representing  various 

restrictions  imposed  on  the  parameter  space;  in  this  case,  the  limit  local  parameter  space  V^{9) 
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is  given  (up  to  a  closure)  by  a  tangent  cone  of  9  at  ^  intersected  with  the  ball  Bs-  This  extends 
the  results  obtained  for  the  point-identified  cases  (Chernoff  1954,  Geyer  1994,  Andrews  1997). 

4.2.  Moment  Inequalities.  Recall  the  setup  of  the  moment-inequality  model  in  Section  2. 
We  have  0/  =  {^  G  9  :  ||£'p[mi(^)]||+  =  0}.  Assume  there  are  positive  constants  (C,  5,r})  such 
that  for  all  9  eQ 

\\Ep[mi{e)\\\+>   C-idie,Qj)A5),  (4.5) 

and,  recalling  that  m.i{0)  is  a  J- vector  with  components  {mij{6),j  =  1, ...,  J), 

m&-KEp[m.ij{e)]  <  -C  ■  {d{e,e\  9/)  A  5),  for  all  9  eQi  : 

(4.6) 
d(97^9/)  <  e  for  all  ee  [O,??]. 

Equation  (4.5)  is  the  partial  identification  condition.    Equation  (4.6)  states  that  moment 

equations  are  strictly  negative  for  all  9  in  the  contractions  of  9/  and  that  these  contractions 

97"^  can  approximate  9/.  Equation  (4.6)  needs  not  hold  generally,  but  it  is  satisfied  in  many 

empirical  examples  listed  in  Section  2.^^ 

In  order  to  state  the  regularity  conditions  define 

Qj  :=  {9eQj:  Ep[m,j{9)]  =  0  V j  G  J,  Ep[m,,j{9)]  <  0  Vj  G  J'}, 

where  J  is  any  (non-empty)  subset  of  {1,T..,  J}  and  J'^  is  the  complement  of  J'  relative  to 

{I,..;  J}. 

Condition  M.2.  Suppose  the  following  conditions  hold  for  the  moment  inequality  model  of 
Section  2:  (a)  Q  is  a  non-empty  compact  subset  of  JR  ,  and  the  criterion  function  Qn{9)  is 
defined  on  a  neighborhood  Q'ofQinJR,  and  is  jointly  measurable  in  9  ^  Q'  and  data  Wi,  ...,Wn 
defined  on  a  complete  probability  space  {Q,,T,P),  (b)  9  is  such  that  the  graph  of  the  local 
parameter  space^^  V^\9  G  Qj  converges  to  some  set  V^\9  G  Qj  in  the  H aus dor ff  metric,  where 
V^\9  G  Qj  is  non- decreasing  in  5  >  0,  for  each  J ,  (c)  {mi{9),9  G  9'}  satisfies  P-Donsker 
condition  stated  in  Section  4-1,  (d)  Ep[m.i{9)\  satisfies  partial  identification  condition  (4-5) 
and  has  continuous  Jacobian  G{9)  =  VeEp\m,i{9)]  for  each  9  G  9',  (e)  Wn{9)  =  W{9)  +  Op{l) 
uniformly  in  9  E  Q' ,  where  W{9)  is  a  diagonal  matrix  with  positive  diagonal  elements  and  is 
continuous  for  all  0  G  9',  and  (f)  condition  (4-6)  holds. 


■^■"■A  detailed  illustration  and  verification  of  this  condition  for  the  linear  moment  inequality  framework  has 
been  provided  in  the  previous  version  of  this  paper  (Chernozhukov,  Hong,  and  Tamer  2002). 
22The  set  V^\e  e  @j  is  defined  as  {(61,  A)  eV^-.O  e  Qj). 
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Most  of  these  assumptions  are  conventional.  We  need  them  to  verify  C.l,  C.2,  C.4,  C.5 
and  other  main  conditions.  Condition  M.2(b)  is  an  assmiiption  of  Chernoff  type,  which  is 
needed  for  the  analysis  of  false  coverage,  as  discussed  in  Section  3.6,  and  for  the  second 
part  of  Theorem  4.2  below.  Lemma  4.2  stated  below  provides  further  discussion  of  M.2(b). 
Condition  M.2(b)  can  be  replaced  by  the  classical  assumption  that  0  is  convex,  in  which  case 
V^  has  a  very  simple  form  stated  in  Lemma  4.L  Condition  M.2(f)  is  needed  to  verify  the 
"degenerate  interior"  condition  C.3. 

Theorem  4.2  (Moment  Inequalities).  (1)  Conditions  M.2(a,c,d,e)  imply  C.l,  C.2,  C.4,  C.5 
with  J  =  2,  ttn  =  n,  and  h^  —  \/n.  If  further  condition  M.2  (f)  holds,  then  C.3  holds.  If  further 
condition  M.2(h)  holds,  then  S.1-S.3  holds,  and  the  sup-limit  of  in{0,  A)  :=  nQn{9  +  A/'i/n)  is 
given  by:  £^{9,X)  =  \\  {A{e)  +  G{9)\  + ({9))' W^-^^{9)\\1.  In  particular, 

C=snpi^{9,0)  =  sup  \\{A{9)+a9)yw'/'~{9)\\l,  uj) 

where  A(^)  is  a  zero-mean  Gaussian  process  defined  in  (4-2)  and  ^{9)  —  {^j{9),j  <  J)  with 
(j{9)  =  -oo  if  Ep[m^j{9)]  <  0  and  ^j{9)  =  0  if  Ep[m,j{9)]  =  0. 

(2)  When  Qn{9)  —  inf^/ge  (5n(^')  t-s  used  for  inference,  conditions  M.2(a,b,c,d,e)  imply 
C.l,  C.2,'  C.4,  C.5,  and  S.1-S.3.  In  particular,  7  =  2,  a„  =  n,  bn  =  \fn,  and  the  sup- 
limit  of  ln{9,X)  :=  nQn{9  +  A/y^)  -  nmfe>(.QQn{9')  is  given  by:  loo{9,X)  =  £co{9,X)  - 
mi(0>^x')ev^  ^00(6"',  A'),  where  V^  :=  lim^^oo  V^.  In  particidar  C  =  supg^Q^  ^00(6*,  0),  i.e. 

c  =  sup \\{A{9)+mrw'/'mi-,j^i  \\{A{9)-^G{9)x+myw'/'{9)\\i,    (4.8) 

where  the  second  term  equals  zero  if  M.  2(f)  holds. 

Therefore,  for  c  ^p  c{a),  the  region  Cn{c)  is  consistent  at  l/\/n  rate  with  respect  to  the 
Hausdorff  distance  as  an  estimator,  and  has  asymptotic  coverage  q  as  a  confidence  region. 
The  theorem  also  obtains  the  sup-limit  £00  of  the  empirical  process  in,  which  describes  the 
limit  behavior  of  the  related  inferential  statistics.  Following  Section  3.6,  the  latter  results  are 
needed  to  describe  the  probability  of  false  coverage. 

The  quantiles  of  C  in  (4.7)  can  be  estimated  by  either  the  generic  subsampling  method  of 

Section  3.5  or  simulating  the  limit  distribution.  The  latter  method  is  generally  more  accurate 

than  subsampling. 
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Remark  4.3  (Quantiles  of  (4.7)  by  Simulation).  If  the  data  are  i.i.d.,  we  can  simulate  the 
limit  distribution  of  C„  by  making  the  simulation  draws  of 

CI  :=  sup  cm,    CM  :=  ||(A;(0)  +  m)'Wl'\e)\\l, 

where  A*(^)  =  n"^''^  X^"^j7nj(^)zj],  and  {zi,i  <  n)  is  a  n-vector  of  i.i.d.  i'V(0, 1)  vari- 
ables. Note  that  A*(^)  is  a  zero-mean  Gaussian  process  in  L°°(0)  with  covariance  function 
En\mi{e)mi{e')%  as  discussed  in  Remark  4.1.  ^{9)  :=  {ij{9),j  =  1,...,  J)'  with  ^j{9)  :=  -oo 
if  En[mij{9)]  <  —Cj\ogn/^/n,  and  (j{9)  :=  0  if  En{mij{9)]  >  —CjlognJ s/n,  for  some  positive 
constants  Cj  >  0. 

Remark  4.4  (Quantiles  of  (4.8)  by  Simulation).  If  the  data  are  i.i.d.,  we  can  simulate  the 
limit  distribution  of  C  by  making  the  simulation  draws  of 


C::=supQ(e),  C:{9)  :=  mi{e)+m'Wll\9)f,-    Inf     \\[K{6)+G{9)\+myWl'\e)\\\ 


where  G{9)  is  a  uniformly  consistent  estimate  of  VeEp[m.,{9)]. 

The  form  of  0  plays  an  important  role  in  determining  the  limit  form  of  local  parameter 
spaces  and  of  the  statistic  C„(5),  which  behavior  determines  the  probability  of  false  coverage. 

Lemma  4.2  (Chernoff  Regularity  for  Moment  Inequalities).  Sufficient  conditions  for  the 
graph  of  the  local  parameter  space  V^\9  €  Qj  to  converge  in  the  Hausdorff  metric  to  some 
set  V^\9  G  Qj  that  is  non- decreasing  in  5  >  0  include  either  one  of  the  following:  (1) 
Suppose  there  exists  5  >  0  such  that  Bs{9)  C  0  for  each  9  e  Qj.  Then,  V^  =  V^  — 
Qj  X  Bs  for  all  sufficiently  large  n.  (2)  Suppose  Q  =  Qg  fl^^j  {9  elR'^  :  gr{9)  =  0},  where  Qg 
is  a  compact  and  convex  set,  gr  :  Q'  —^  ^  "^  has  continuous  Jacobian  Vgr{9)  with  a  coTistant 
row  rank  over  Q'  a  neighborhood  of  Qg  in  IR''.  Then  the  above  convergence  holds  with  V^ 
that  has  ¥^{9)  =  {X  E  B5  :  X  e  y/n'{Qg-9)  for  some  n'  >  l,Vegr{d)X  =  Q,r  =  l,...,R]. 

Lemma  4.2  is  similar  to  Lemma  4.1  and  the  comments  that  are  similar  to  those  stated 
after  Lemma  4.1  apply  here. 

5.  Appendix  A:  Notation 
The  following  standard  notation  for  empirical  processes  will  be  used: 
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The  notions  of  convergence  and  outer  and  inner  probabilities,  P*  and  P*,  are  defined  as  in  van  der 
Vaart  and  Wellner  (1996).  For  instance,  — >p  denotes  convergence  in  outer  probability,  wp  — >  1  means 
"with  the  inner  probability  approaching  1";  the  stochastic  order  notations  Op(l)  and  Op(l)  are  with 
respect  to  P* ,  unless  otherwise  stated.  Notation  =d  means  equality  in  law:  given  two  elements  X 
and  Y  that  map  fi  to  a  metric  space  ^,  X  =,1  Y  if  Ep*[f[X)]  —  Ep'[f{Y)]  for  every  bounded 
/  :  D  I— >  IR,  where  Ep*  denotes  outer  expectation  with  respect  to  P.  Let  ||a:||+  =  ||max(x,0)||  and 
||x||_  =  II  max(— a;,0)||,  where  in  the  case  of  vectors  the  max  operations  are  elementwise.  Bs  denotes 
a  closed  ball  of  diameter  5  centered  at  the  origin.  In  many  instances,  we  use  abbreviated  notation 
sup^  /  to  mean  suP(jg^/(a),  unless  an  ambiguity  arises,  in  which  case  the  latter  notation  is  used. 
The  Hausdorff  distance  between  sets  is  defined  as 


dH{A,B)  :—  max 


sup  d{a,  5),  sup  d{h,  A) 


,  where  d{h,A)  :—  inf  ||6 


and  dH{A,B)  :=  oo  if  either  A  ov  B  is  empty.  The  e-expansion  of  G/  is  defined  as  0j  :—  {0  £  9  : 
d{0,  Qi)  <  e},  and  the  e-contraction  of  9/  as  97^  :=--  {9  eQi  :  d{e,  9  \  9/)  >  e},  where  e  >  0. 


6.  Appendix  B:  Proofs 

6.1.  Proof  of  Theorem  3.1:  PROOF  OF  Part  (1).  Step  (a).  Wp  ->  1  by  C.l(e)  and  by  c-^p  oo, 
supe^  Qn  =  Op(l/a„)  <  c/an,  which  implies  9/  C  9/,  which  implies  supg^g^  d(6,  9/)  —  0. 

Step  (b).  For  any  e  >  0,  infQ\^0j  Q„  =(j)  infe\ee  Q  +  Op(l)  >(jj)  (5(e)  +  Op(l)  for  some  6{e)  >  0, 
where  (i)  follows  from  uniform  convergence  as  assumed  in  C.l(d)  and  (n)  from  Q  being  minimized  on 
9/  as  assumed  in  C.l(b).  Similarly,  supg  Q  =  supg  Qn  +  Op(l)  <(j)  c/an  +  Op(l)  =(ij)  Op(l),  where 
(i)  holds  by  construction  of  9/  and  (ii)  holds  by  c/ttn  ^p  0.  Hence  supg  Q  <  6{e)  =  infe\e=  Q 
where  6{e)  >  0,  wp  — *  1.  Hence  9/  n  (9  \  9f)  —  0  wp  — >  1,  which  implies  9/  C  9j.  Given  Step  (a), 
this  implies  supgQ(i(0, 9/)  <  e. 

Combining  Steps  (a)  and  (b),  ci/f(9/,9)  <  e  wp  — >  1.  Since  e  >  0  is  arbitrary,  the  result  is 
proven.  D 

Proof  of  Part  (2).  For  any  £  >  0  there  exist  positive  constants  {n^,K,^,K)  such  that  for  all 
n  >  Tie  we  have  c/an  <  ^  and  c/an  >  i^e/cin,  by  c/un  — >p  0  and  c -^p  oo  ;  so  that,  with  probabihty 
larger  than  1  —  e. 


inf  anQn[0)  >u)  n-  an-  [\c/{ann)f''^  A  5]    ^ui)  c, 


where  (i)  follows  by  C.2  and  (ii)  follows  by  c/o„  -^p  0.    By  construction  of  9/,  we  have  that 

supa  anQn  <  2".  Hence  9/  C  0^^'''""^'"      .  Hence,  combining  with  Step  (a)  of  the  Proof  of  Part  (1), 

we  have  that  dH(9/,9/)  <  [c/(an«:)]i/T.  Therefore  diy(9/,9/)  =  Op([c/a„]i/T).  D 

Proof  of  Part  (3).  When  9  =  9/,  by  Step  (a)  of  Proof  of  Part  (1),  9/  =  9  wp  ^  1,  so 

dHi@i,Q)  =  Owp^l.  D 
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6.2.  Proof  of  Theorem  3.2.  .  PROOF  OF  Part  (1).  Fix  any  e  £  (0,77].  It  follows  that  wp 
— >  1,  ©7^  Q{i)  C'n(c')  C(;j)  Cn{c)  C(jjj)  G|,  where  c  is  from  Theorem  3.1.  Inclusion  (i)  follows  since 
supg-t  anQn  =  0  <  c*  wp  — >  1,  by  C.3(b),  (ii)  follows  from  c  >  c'  wp  -^  1,  and  (iii)  follows  from  Part 
1  of  Theorem  3.1.  Since  dH{Qj^,@i)  <  e  by  Condition  C.3(a)  and  dH{Q%©i)  <  e  by  definition  of 
6},  it  follows  that  d//(67,C„(c'))  <  e.  Part  (1)  follows.  D 

Proof  of  Part  (2).    Let  e„  =  inf{e  :  supg-e  a„Q„  =  0}.    By  Condition  C.3(c)  e„  exists 

and  e„  =  Op{an^^'^).  Hence  by  C.3(a)  and  C.3(c),  d//(0"'",e/)  =  Op{an^^'').  Then  we  have  that 
©7^"  C  CnC^)  Q  Cn{c')  wp  — +  1,  where  c'  >  c  and  ?  =  c  +  Op(l).  It  can  be  shown,  similarly  to  the 
Proof  of  Part  (2)  of  Theorem  3.1  that  for  any  e  >  0,  there  exist  {5i:,n^)  such  that  for  all  n  >  ng,  we 
have  that  Cn[c')  C  Qfl<'^^"\  Conclude  that  dH{Cn{c),  9/)  =  Op{an^''').  D 

Proof  of  Part  (3).  Under  the  stated  condition,  it  is  immediate  that  9/  =  0/  =  9  wp  -^  1. 
Hence  dn {&!,&)  wp  — >  1.  D 

6.3.  Proof  of  Lemma  3.1.  PROOF  OF  Part  (1).  Clearly,  C„  =  sup^g©^  a„Q„(0)  <  c  implies 
9/  C.  Cn{c)  =  {9  €  Q  :  anQn{9)  <  c}.  Conversely,  9/  C  Cn{c)  implies  C„  —  supg^Q^  a,nQn{9)  <  c  by 
compactness  of  9/.  D 

Proof  of  Part  (2).  The  result  is  elementary  and  its  proof  is  therefore  omitted.  D 

6.4.  Proof  of  Theorem  3.3.  PROOF  OF  Part  (1).  It  suffices  to  prove  the  result  for  ci  only.  The 
proof  for  any  subsequent  step  is  identical  to  this  proof,  since  c\  is  allowed  to  be  data-dependent. 
Step  1  is  special  to  our  problem,  while  Step  2  is  standard  for  subsampling. 

Step  1.    By  Theorem  3.1  or  Theorem  3.2  wp  ->  1,  we  have  that  9^"  C  Cn{c)  C  9^",  where 
e„  :—  (Inn/ttn)^/'''  and  7?„  :=  — e„,  if  C.3  holds,  and  ??„  :=  0,  if  C.3  dofes  not  hold.  Hence  wp  — >  1, 

Cb„  :=  supafcQj^b^n  <  Cj^h,n  ■=  sup  abQj,b,n  <  Cj,b,„  :=  sup abQj^b,n,       for  all  j  <  B„, 

97"  Cn{c)  Q]^ 

where  index  j  denotes  that  the  statistic  was  computed  using  j-th  subsample;  total  number  of  sub- 
samples  is  Bn-  Define  G(,^„(x)  :=  B~^  Si=i  ^{^j,b,n  <  2;}.  Hence  wp  — >  1 

G^Jx)  :=  B-'  ^  l{C,,b,n  <x}<  Gb,n{x)  <  GbA^)  :-  B''  Y,  H^^An  <  ^}- 

By  Step  2  below  Gi,„(x)  ^p  G{x)  =  P{C  <  x]  and  Gf,,„(x)  ~>p  G{x)  =  P{C  <  x),  for  each  a;  >  0. 
This  proves  that 

Gb,n{x)  -^p   G{x)  =  P{C  <  x}  for  each  x  >  0.  (6.1) 

Convergence  of  the  distribution  function  at  continuity  points  implies  convergence  of  the  quantile 
function  at  continuity  points.  By  C.4  c{a)  :=  G"~^(a)  is  continuous  in  a  £  (0,1).  Hence,  (6.1) 
implies  that  c  :=  G^^{a)  — >p  G^^{a)  for  each  a  S  (0, 1). 

Step  2.    Define  C_^  :=  supQ^n  abQb  and  Cb  ■—  supgcn  a^Qb-    Write  Gt_„(x)   =  £'p[G{,„(x)]  -|- 

—  (21 

Op(l)  =  P{Cb  <  x}  +  Op(l)    =   P{C     <  x}  -I-  Op(l)  at  each  x  >  0.     Conclusion  (1)  follows  by 

Varp(S~'^  J2j=i  l{^j,M  <  x})  —  o(l).  For  i.i.d.  data,  this  follows  from  Bn  —*  oc  and  the  Hoeffding 
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inequality  for  bounded  l/'-statistics;  for  stationary  a-mixing  series,  this  follows  from  i?„  — >  oo  and 
an  upper  bound  on  covariance  given  in  the  proof  of  Theorem  3.2.1  in  Politis,  Romano,  and  Wolf 
(1999).  Conclusion  (2)  follows  by  C.5  and  C.4  and  by  e„  =  o{l/a^  )  and  77„  =  o(l/aj  )  arising  due 
to  restrictions  on  the  subsample  size  b  and  the  rate  a„  stated  in  conditions  (b,c)  of  this  theorem. 
Likewise,  conclude  Gb^ni'x)  — *p   ^(2;)  ==  P{C  <  x}.  D 

Proof  of  Part  (2).  The  result  follows  from  Lemma  3.1  D 

6.5.  Proof  of  Lemma  3.2.  PROOF  OF  Part  (1).    Conditions  S.l  and  S.2  immediately  imply 

(3.4).  D 

Proof  of  Part  (2).  This  part  shows  that  S.3  imphes  S.2.  Note  that  for  any  6  >0  and  e  >  0 
there  exists  a  finite  set  M(e)  C  V^  such  that 

lim sup P{sup £„  <  r}  <(j\  limsupP{max^„  <  r}  <(jj)  P{max£(x>  <  ^}  <(m)  P{sup^oo  <  r  +  e}  +  e, 

n-*oo  v^  n^oo  M(£)  M(e)  ys^ 

where  inequality  (i)  follows  from  sup^d  in  ^  sup;j,|/£\  in,  (h)  from  the  finite-dimensional  convergence 
condition  S.3(A),  and  (iii)  from  the  finite-dimensional  approximability  condition  S.3(B)  applied  for 
n  —  00.  Since  e  is  arbitrary,  hmsup„^QoP{supv<5  in  <  r}  <  P{supyi  ^00  <  r}.  Further,  for  any 
(5  >  0  and  e  >  0  there  exists  a  finite  set  M{£)  C  V^  such  that 

lim  inf  Pjsup  £„  <  r}  >(i\  liminf  P{max£„  <  r  —  e}  —  e 

n-»oo  ys  "•  '     n-*oo  M{e) 

>/u)  P{max^oo  <r-6}  -e  y/m)  Pjsup^oo  <r-e}  -e, 

M(e)  yS 

where  inequality  (i)  follows  from  the  finite-dimensional  approximability  condition  S.3(B),  (ii)  from 
finite-dimensional  convergence  condition  S.3(A),  and  (iii)  from  sup^s  i^  >  su^p{^^^^^irx:,■  Since  e  is 
arbitrary,  hminf„^oo  P{supy«  in  <  1^}  >  P{sup^,^5  £00  <  ^}-  Conclude  by  the  Portmanteau  lemma 
that  supys  in  —^d  supys  i^o-  The  joint  convergence  of  (supys  £„,  (5  G  A)  for  finite  set  A  in  S.2  follows 
similarly.  D 

6.6.  Proof  of  Theorem  4.1.  PROOF  OF  PART  (1).  The  proof  is  organized  in  the  following  steps. 
Step  1  verifies  C.l  and  C.2.  Step  2  gives  an  auxiliary  basic  approximation  for  £„.  Using  Step  2,  Step 
3  verifies  C.4,  Step  4  verifies  C.5,  and  Step  5  verifies  S.1-S.3. 

Step  1.  (C.l  and  C.2:  Uniform  Convergence  and  Quadratic  Minorants)  Condition  C.l  is  imme- 
diate from  condition  M.l(a,c,d,e).  In  particular,  uniform  convergence  and  the  rates  of  convergence 
an  —  n  and  6„  =  y/n  in  C.l  follow  from  {mi{9),  9  G  Q}  being  P-Donsker  and  having  Ep[mi(9)]  —  0 
on  ©;.  To  verify  C.2  observe  that  wp  —^  1,  uniformly  in  9  G  Q 

nQn{9)  =  ||(Gn[m,(0)]  +  y^Ep[m,{9)]yW^/^i9)f     by  definition 

>  C  ■  \\Gn[mi{9)]  +  V^Ep[mi{9)]f    by  inf  mineig  Wni9)  >  C  >  0,  wp  ^  1,  by  M.l(e) 

>  C-  |v^||^p[mi(0)]||  -  \\Gn[mi{e)]\\\^  by  inequality  \\x  +  y\\  >  \\\y\\  -  ||x||| 

>C-\C-V^{d{9,ei)A5)-Op{l)f,  hysnp\\Gn[mi{9)]\\  =  0p{l)    and  M.  1(d), 

eee 

(6.2) 
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where  sup^gg  ||G„[mj(^)]||  ~  Op{l)  follows  from  P-Donskerness.    Therefore,  for  any  £  >  0  we  can 
choose  (Kejne)  large  enough  so  that  for  all  n  >  n^  with  probabUity  at  least  1  -  e 

nQniO)  >l-C-C^-n-  [d{9, 0;)  A  6f  uniformly  on  {0  G  9  :  d{e,  9/)  >  Ke/n^^~}. 

This  verifies  C.2. 

Step  2  (An  Auxiliary  Expansion).  Write  £^{0,  A)  =  \\^/nEn[m^{e+X/^/ii)]'Wn^'^{e+\/^)\\^  = 
II  {Gn[mi{e  +  A/v^)]  +  -/nEplm^ie  +  X/y^)])'  Wn^^{e  +  X/^)f.  For  any  non-empty  compact  sub- 
set K  oiJR'^,  we  have  uniformly  in  (6',  A)  £QxK:  (1)  Gn[mi{0  + X/^/n)]  =  Gn[mi{9)]  +  0p{l),  by  the 
stochastic  equicontinuity  arising  due  to  P-Donskerness,  (2)  Wn{0+X/^/n)  =  W{6)+0p{l),  by  M.l(e), 
and  (3)  G„[mj(^)]  =d  A{9)  -t-Op(l)  in  ^"^(9),  by  P-Donskerness,  where  A{9)  is  the  Gaussian  process 
defined  in  the  statement  of  the  theorem,  and  (4)  ^/nEp[mi{9  +  X/y/n)]  =  G{9)X  +  o(l),  by  M.l(d) 
and  by  Ep[mi{9)]  =  0  for  all  ^  £  9/.  These  results  imply  that 

4(^,A)  ^d  \\{A{9)  +  G{9)Xyw'/\9)f +Op{l)  in  L-(9/  x  K). 
^oo(e,A) 

Note  that^oo(0,  A)  is  stochastically  equicontinuous  in  L°° (9/ x/i'),  because  (0,  A)  \-^  {A{9),G{9)X,W{9)) 
is  stochastically  equicontinuous  in  L°°(97  x  K). 

Step  3  (C.4:  Convergence  of  C„).  By  Step  2,  C„  =d  sup^gQ^  ||A(^)'Tyi/2(0)||2+Op(l)  =  C-f  Op(l), 
where  C  >  0  a.s.  and  has  a  continuous  distribution  function  by  Theorem  11.1  of  Davydov,  Lifshits, 
and  Smorodina  (1998).  This  verifies  C.4. 

Step  4  (C.5:  Approximability  of  C„).  By  expansions  in  Step  2 

Cn{5n)  =      sup      nQn{9)  =d      sup      \\A{9)'W^'\9)\\  +  Op{l)  =d  sup  \\A{9)'W'I\9)\\  +Op(l), 

C 

where  the  last  equahty  follows  by  stochastic  equicontinuity  of  0  i->  A{9yW^/'^{9).  This  verifies  C.5. 

Step  5  (S.1-S.3:  Limits  of  Related  Statistics)  This  step  shows  that  if  M.l(b)  holds  in  addition 
to  M.l(a,c,d,e),  then  S.l  and  S.3  hold.  S.3  implies  S.2  by  Lemma  3.2.  M.l(b)  states  dH{V^,V^)  = 
o(l).  Then,  for  some  e„  j  0,  Isup^-^^^  -  sup^^^^j  <  supi\i^gx)-{e',\')\\<en  I4(^,A)  -4(6i',A')|  = 
suP||(6),A)-(e',A')||<en  Koo(^,  A)  -  ^00(6*',  A')  +  Op(l)|  +  Op(l)  =  Op(l)  by  Step  2  and  stochastic  equiconti- 
nuity of  ^oo(6',A).  This  verifies  S.l(B).  Condition  M.l(a)  implies  S.l(A). 

By  Step  2,  the  finite-dimensional  limit  of  4(6',  A)  equals  £00(6*,  A)  =  ||(A(6I)  +  G{9)XyW^/^{9)f. 
This  verifies  S.3(A). 

Finally,  note  that  by  stochastic  equicontinuity  of  £oo(^,  A)  and  Step  2,  finite-dimensional  approx- 
imability condition  S.3(B)  is  trivially  satisfied.  D 

Proof  of  Part  (2).  The  proof  is  similar  to  the  proof  of  Part  (1),  and  it  is  therefore  omitted.  In 
particular,  we  have  that  nQn{9)  =  nQn{9)  —  ninfg'ge  Qn{^'),  where  asymptotic  approximations  for 
the  first  term  are  identical  to  the  proof  of  Part  (1).  The  second  term  inf^/ggnQn  (&')  can  be  arbitrarily 
well  approximated  by  mi,Q^^s^y6  nQn{9+X/^/n)  for  a  sufficiently  large  5.  Then  as  in  Part  (1)  it  follows 
that  inf(5i  ;^)gv'4  nQn{9  +  X/^/n)  —d  mf(^gx)QV^  ^oo{9,  X)  +  Op(l).  Setting  5  arbitrarily  large  gives  that 
infg/ge  nQn{9')  =  inif^gx-j^yx,  ioo{9,  A)  -I-  Op(l).  The  limit  \rd(^gx)&v^  ^oo(^,  A)  exists  and  is  tight  due 
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to  monotone  convergence:   as  5  t  cxd,  V^  f  V^,  and  mf(^g)^)^Y^ioo{9,X)  [  mi^g\)^v^^oo{d,X)  >  0 

a.s.  n 


6.7.  Proof  of  Lemma  4.1.  PROOF  OF  PART  (1).  This  part  holds  trivially.  D 
Proof  of  Part  (2).  Consider  the  simplest  case  where  G  =  9^  is  convex  and  compact.  Define 

VM  ^  {X  e  Bs  :  X  €  V^iQg  -  9)  for  some  n'}.  Define  V^  =  {(6',A)  :  6  eQi,Xe  v^(0g  -  6)}. 
Note  that  V^  C  V^  by  convexity  of  Qg  and  V^  t  V^  monotonically  in  the  set-theoretic  sense.  This 
implies  convergence  in  the  Hausdorff  distance  because  V^  and  V^  are  subsets  of  a  compact  set. 

Further,  let  V^^  denote  V^  from  the  convex  case.  Define  V^  :=  V^^  nf^j  Mi^,  Mij.  =  {{0,X)  : 
6  e  Qi,X  ^  B6,gr{e  +  X/^)  =  0},  Vi  :=  V^'  n^^,  Ml,,  Ml,  :=  {(0,A)  :  9  e  Qi,X  e 
Bs,Ve9riO)X  =  0}.  We  have  dniVlVl)  <  dH{V^',V^')  +  Er^idniMi^Ml,)  =  o(l),23  where 
the  first  term  is  o(l)  by  the  argument  for  the  convex  case  and  the  second  term  is  bounded  by 
X^r^i  supggQ;^  <^//(-^nr(^)'-^oor(^))i  which  is  o(l)  by  an  argument  similar  to  that  in  Lemma  2  in 
Andrews  (1997).  D 

6.8.  Proof  of  Theorem  4.2.  Proof  of  Part  (1).  The  proof  is  organized  as  follows:  Step  1 
verifies  Conditions  C.l,  C.2,  and  C.3.  Step  2  gives  an  auxiliary  basic  approximation  for  in.  Lemnla 
6.1  gives  another  approximation.  Using  Step  2  and  Lemma  6.1,  Step  3  verifies  Condition  C.4,  Step 
4  verifies  Condition  C.5,  and  Step  6  verifies  Conditions  S.1-S.3. 

Step  1  (Verification  of  C.l,  C.2,  and  C.3).  C.l  is  immediate  from  M.2(a,c,d,e).  In  particular, 
uniform  convergence  and  the  rates  of  convergence  a„  =  n  and  6„  =  ^/n  in  C.l  follow  from  {mi{9),6  E 
0}  being  P-Donsker  and  Ep[mi{6)]  <  0  on-B/.  To  verify  C.2  observe  that  wp  -^  1,  uniformly  in 
9eQ 

nQn{9)  =  \\{Gn[m,{9)]  +  V^Ep{mi{9)]yW^^\e)\\l     by  definition 
>C-\\Gn[mi{e)]  +  y^Ep[m,{e)]\\l 
by  inf  mineig  Wni9)  >  C  >  0  wp  -^  1,  by  M.2(e)  ^^'^' 

-  C  •  \\V^Ep[m,{9)]\\l  ■  (||G„K:(0)]  +  V^Ep[mi{9)]\\l/\\V^Ep[m,i9)]\\l). 

By  M.2(d),  \\VnEp[mi{9)]\\l  >C  n-  {d{9,  0/)  A  5)^  on  0  for  some  C>  0  and  5  >  0.  Therefore,  for 
any  e  >  0  we  can  choose  {Ki;,n^)  so  that  for  all  n  >  n^  with  probability  at  least  1  —  e 

nQn{9)  >\-C,-C  -n-  {d{9,  0;)  A  (5)^    uniformly  in  {6i  G  0  :  d{9, 0/)  >  ^Jn^/^}. 

This  follows  by  (6.3),  by  ||y-l-a;||  +  /||x||+  — »  1  as  ||x||+  -^  oo  for  any  y  £  IR'^,  and  by  supg^Q  ||Gn[mj(0)]|| 
Op(l),  where  the  latter  holds  by  the  P-Donsker  property.  This  verifies  condition  C.2. 


^^This  follows  by  the  elementary  inequality  dniAn  B,C  Ci  D)  <  dniAO  B,C  n  B)  +  dniC  n  D,C  f]  B)  < 
dH{A,C)  +  dH{B,D). 
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To  verify  C.3  observe  that  wp  — >  1,  uniformly  in  9  G  Qj 
nQn{e)  =  \\{Gn[mi{e)]  +  ^Ep[mi{e)])'Wl''^[e)\\l   by  definition 

<  C'  ■  ||(G„K(0)]  +  y/^Ep{mi{e)]\\l     by  supmaxeig  Wn{d)  <  C'  <  oo  wp  ^  1,  by  M.2(e) 

<  C'  •  2_^  |'G„[mij(0)]  +  •s/nEp[mij{6)\\'^_^_,    where  subscript  j  denotes  j-th  element  of  vector  m,:(0) 

i<J 

<  C'  •  2Z  l'^p(^)  -  \/n  •  C  •  {d[e,  e  \  Gy)  A  <5)|;  for  some  C  >  0  and  5  >  0  by  M.2(f). 

o<J 
Therefore,  for  any  e  >  0  we  can  choose  k^  large  enough  so  that  for  all  n  >  n^  with  probability  at 
least  1  —  £ 

Qn[e)  -  0  uniformly  on  Q'''^'^  ^^g^Q^,  ^{0, 9  \  9/)  >  K,/n^/^}. 

This  verifies  Condition  C.3. 

Step  2.  (A  Basic  Approximation).  Write  4(61,  A)  =  \\y/nEn[mi{e+X/y/n)]'Wn^'{e+X/^)\\l  = 
II  i<Gn[mi{e  +  A/Vn)]  +  ^Ep[m,{9  +  A/Vn)])'  Wn^^{e+X/^/fi)\\l.  We  have  for  any  5  >  0,  uniformly 
in  (^,  A)  G  (9  X  Bs):  (1)  G„[mj(0  +  X/^/n)]  —  Gn['mi{6)]  +  Op(l),  by  the  stochastic  equicontinuity 
implied  by  the  P-Donsker  property,  (2)  W„(i9  +  X/y/n)  =  W{9)  +  Op(l),  by  M.2(e),  (3)  Gn[mi{e)]  =d 
A{6)  +  Op(l)  in  L°°(9),  by  P-Donskerness  property,  where  A{6)  is  the  Gaussian  process  defined  in 
the  statement  of  the  theorem,  (4)  ^Ep[mi{0  + X/^/ii.)]  -  ^Ep[mi{e)]  ^  G{9)X  +  o{l),  by  A4.2(d). 
Therefore 

en{0,  A)  -rf  \\iA{9)  +  Gi9)X  +  V^Ep[m,{0)]yw'/'{9)  +  Op(l)||^,    in  L°°(9/  x  Bs). 
Steps  3,4,  and  .5  also  make  use  of  the  following  result. 
Lemma  6.1.   The  following  approximation  is  true: 

sup£„(0.  A)  =d  sup  ||(A(0)  +  G{9)X  +  V^Ep[m,{0)]yw'/\9)  +  Op{l)\\l 


'=^  sup  ||(A(0)  +  G{9)X  +  mywy'.{9)  +  Op(l)||t  (6  4) 

^max     sup      J]|(A,(0)  +  G,(^)'A)l^_,f(0)  +  Op(l)|-;, 

where  Qj  :—  {9  G  Qj  :  Ep[mij{9)]  ~  0\/j  G  J,Ep[mij{9)]  <0\/j  G  J'^},  J  denotes  any  non-empty 
subset  of  {1,  ...,J},  arid 

ij{9)  :-  0  if  Ep[m.ij{9)]  =  0  and  ^^{9)  :=  -co  if  Ep[m^J{0)]  <  0.  (6.5) 

The  proof  of  this  lemma  is  given  below,  immediately  after  the  proof  of  this  theorem. 

Step  3.  (C.4:  Convergence  of  C„)  Application  of  Lemma  6.1  for  V^  =  V^  =  Q  x  {0}  yields 

C„  ^  sup  nQn{9)  =d  sup  i|(A(0)  +  m)'W^'-{0)  +  Op(l)f+ 


=  max  sup  X;  \A,{eywjp{9)  +  Op(l)|^. 
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Hence  P[C„  <  c]  — *  P[C  <  c]  for  each  c  >  0,  for  C  defined  in  the  statement  of  the  theorem. 
By  Theorem  11.1  of  Davydov,  Lifshits,  and  Smorodina  (1998),  non-degeneracy  of  the  covariance 
function  of  A(^)  imphes  that  C  has  continuous  distribution  function  on  [0,  oo)  with  a  possible  point 
mass  at  c  =  0.    To  show  P[Cn  =  0]  -^  P[C  =  0],  note  that  non-degeneracy  imphes  that  Y  = 

1/2 

ixiax.j laaxj^j supg^Qj.[Aj{6)Wjj   (6)]  has  a  continuous  distribution  function  on  IR.  Then  by  (6.6) 
■Ppn  <  0]  is  bounded  above  (below)  by  P[Y  <  e„]  with  some  e„  j  0  (£„  T  0),  and  P[Y  <  e„]  -^ 
P[Y  <  0]  =  P[C  <  0].  This  verifies  Condition  C.4. 
Step  4.  (C.5:  Approximability  of  C„)  By  Step  2 

sup     nQn{e)=d      sup      \\{A{e)  +  ^Ep[mi{9)]yw'/^{9)  +  Opil)\\l 

and  by  stochastic  equicontinuity  of  ^  h^  {A{9),W^^'^{9))  and  by  supug, _0»^j^,/:j^\\^/n{Ep[mi{9)]  — 
Ep[mi{9')])\\  =  o(l)  it  follows  that  for  any  (5„  j  0  or  (5„  T  0 

sup      \\iAi9)+V^Ep[m,{9)]yw'/\9)+Oj,{l)\\l  =  sup  \\{A{9)+V^Ep[mi{e)]yw'/^i9)+Op{l)\\l. 

Then  it  follows  as  in  Step  3  that  P[Cn{Sn)  <  c]  — >  P[C  <  c]  for  each  c  >  0.  This  verifies  condition 
C.5. 

Step  5.   (Verification  of  S.1-S.3)  S.l(A)  follows  from  M.2(a).  S.l(B)  and  S.2  follow  from  Lemma 

6.1.  Further,  in  equation  (6.4),  for  each  J,  supys^^g^Q^  J^jej  \i^A^)  +  Gj{eyX)WJj\e)  +  Op{l)\l 
admits  finite-dimensional  approximation  by  stochastic  equicontinuity  of  (9,  A)  i-+  {A{9),G{9)X,  W{9)) 
in  L°°(0  X  Bs),  which  implies  S.3(B).  By  Step  2,  the  finite-dimensional  limit  of  £„(0,A)  equals 
ioc{0,\)  ■-  \\{A{9)  +  G{9)X  +  ^{9)yW^/^{9)\\l,  which  verifies  S.3(A).  D 

Proof  of  Part  (2).  The  proof  is  similar  to  the  proof  of  Part  (1),  and  it  is  therefore  omitted.     D 

6.9.  Proof  of  Lemma  6.1.  The  first  equality  in  (6.4)  is  immediate  by  Step  2  of  the  proof  of  Theorem 

4.2.  Equality  (*)  in  (6.4),  the  main  claim  of  the  lemma,  is  proven  as  follows.  Define 

fn{9,X,x):=\\{A{9)  +  G{9)X  +  V^Ep[mii9)]yw'/'i9)  +  x\\l, 
gn{9,X,x):^\\{A{9)  +  G{9)X  +  myw'/\9)  +  x\\l. 

Step  1.  Wp  -^  1,  for  some  e„  J.  0,  gn{9,X,-en)  <(i)  /n(6',  A, -e„)  <(ii)  £n{9,X)  <{zzi)  fn{9,X,en). 
Here  (i)  follows  by  y/nEp[mi{9)]  >  ^{9)  for  each  9  G  Qi  and  by  monotonicity:  xi  >  X2  imphes 
\\{A{e)  +  G{0)X+.xiyW^/^{9)\\l  >  \\{A{9)  +  G{9)X  +  X2yW^/\9)\\l,  vecallmg  tha.t  W{9)  is  dia,gonal 
with  positive  diagonal  entries,  and  (ii)  and  (iii)  follow  from  Step  2  of  the  proof  of  Theorem  4.2. 
Therefore,  wp  -^  1,  for  some  e„  j  0 

supp„(6i,A,-e„)  <  sup4(6i,A)  <  sup/„(6',  A,e„). 

Step  2.  Furthermore,  for  any  e„  f  0  or  e„  J,  0 

.  sup5„(6',A,£„)  =  sup5„(6',A,Op(l))  =  sup5„(6l,A,Op(l)),  ,„^. 

v^  VI  VT  ^^■^) 
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where  V^  is  the  closure  of  V^.  To  show  this,  write 

sup5„(0,A,£„)  =  max     sup     V  |(Aj(0)  +  G,(e)'A)VK/f  (0)  +  e„|2.. 

Then  by  M.2(b)  for  every  J,  dniV^lO  e  ej,V^\9  G  Qj)  =  o(l),  which  impHes  by  stochastic 
equicontinuity  of  (0,  A)  ^  {A{e),G{e)X,W{e))  in  L°°(0  x  Bs)  that 

sup     Y.  I(^(^)  +  G,(e)'A)T4^f  (0)  +  e„|2. 

=      sup     J]|(A,(e)  +  G,(0)'A)l^f(0)  +  Op(l)|i 

=      sup     y:i(A,(g)+G,(gyA)I^f(g)  +  Op(l)P^, 

so  relation  (6.8)  follows. 

Step  3.  This  step  shows  that  for  some  e^  i  0, 


sup/„(6>,A,e„)  <  supg„(^,A,e^)     wp  -^  1. 
ys  ys 


(6.9) 


Observe  that  for  any  0„  G  0/  converging  to  6  ^  Qj, 
limsup ^/nEp[m,J{0n)]  <  (jiO)  if  O(^)  =  0.    ^iTasup^.Ep[mij{en)]  =  ^j{0)  H  ^jiO)  =  -oo.   (gjo) 

n  n 

Let  Q,n,£  —  {lo  ^rt  :  supgg©^^;^^^^  l|A(0)||  <  K^].  For  any  £  >  0,  there  exists  K^  such  that  P(n„_£)  > 
1  —  e  for  all  n>  Ue-  Suppose  that  relation  (6.9)  does  not  hold,  then  there  must  exist  constants  e  >  0 
and  e  >  0  and  a  subsequence  (w„(fc),  On(k),K{k))  with  u^^k)  £  ^n{k),e^  {Sn{k),K{k))  e  V^^^^y  such  that 

lim[/n(fc)(^n(fc))'*^n(fc).en(fc))  "  SUp5„(fc)  (6*,  A,  e)](w„(fc))  >  0.  (6.11) 

Select  a  further  subsequence  such  that  ^n(fc(0)  ^  ^*  ^^'^  ^n{k{i))  ~*  '^*!  where  (^*,  A*)  is  in  the  closure 
of  V^  by  dniV^,  V^)  ^  0  and  by  V^CQj  x  Bg.  As  in  Step  2  conclude  that 

sup5„(6l,A,e)  =sup5„(6',A,e)  >  g„((9*,  A*,e/2)  wp  -^  1  , 

which  together  with  (6.11)  gives  that  lim; [/„(;,(,)) (6i„(fc(i)),  A„(fc(;)),0)  -  g„(fc(,)) (6**,  A*,  e/2)](tj„(fc(,)))  > 
0.  Given  the  definition  of  /„  and  Qn  stated  in  (6.7),  this  inequality  can  occur  only  if 


limsupVnM))£^pKj(^^(/c(/)))]  >  Ci(^*) 

for  some  j.  This  gives  a  contradiction  to  (6.10).  Therefore,  the  claim  of  Step  3  is  correct. 

Combining  Steps  1,2,  and  3  implies  the  result  of  the  lemma.  D 


6.10.  Proof  of  Lemma  4.2.  Define  V^j  :=  V^\9  G  Oj.  Apply  the  proof  of  Lemma  4.1  for  V^  to 
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7.  Extension:  Pointwise  Approach 

Suppose  that  one  is  interested  in  a  particular  parameter  9*  inside  Qj.  The  inference  about  some 
9*  in  0/  is  well  motivated,  when  there  is  a  sense  in  which  9*  is  the  true  parameter.  The  latter 
is  typically  the  case  when  it  is  maintained  that  the  economic  models  are  correct  representations  of 
data-generating  processes  of  real  data  for  some  parameter  value  9*.^'^  In  this  scenario,  9/  is  not  of 
interest  per  se,  but  rather  9*  is. 

In  order  to  facilitate  inference  about  9*  we  make  the  following  assumption. 

Condition  C.6.  Suppose  there  exists  a„  -^  oo  such  that,  for  Cn{9)  :=  anQni^),  •P(Cn(^)  <  c)  — > 
P{C{9)  <  c)  for  each  c  >  0  and  each  9  G  Qj,  where  C{9)  is  a  real  random  variable  that  has  a 
continuous  distribution  function  on  [0,oo)  and  a-quantile  denoted  as  c{a,9).  Moreover,  for  at  least 
one  9  &  Qj,  C{9)  >  0  with  positive  probability. 

Using  the  fact  that  Qi9)  =  0  a.t  6  =  9*,  we  construct  a  confidence  region  for  9*  as  follows.  We 
test  whether  whether  Q{9)  =  0  for  each  9  G  Q.  Then  we  collect  all  ^  £  6  that  pass  the  test  to  form 
a  confidence  region  for  9*.  More  precisely,  we  collect  all  0  G  0  such  that  a„Qn{9)  <  c{a,9). 

Towards  the  construction  of  confidence  regions,  suppose  the  estimate  c{9)  is  available  such  that 
c{9)  — >p  c{a,9)  for  each  9  £  O/.  Consistent  estimates  c{9)  can  be  obtained  by  subsampling  or,  for 
the  moment  condition  models,  through  the  use  of  the  limit  distributions  obtained  in  Theorem  7.3. 
Consistency  of  the  subsampling  estimate  c{9)  follows  by  the  standard  argument,  e.g.  the  one  given 
in  Step  2  of  the  Proof  of  Theorem  3.3.  It  should  be  noted  that  subsampling  is  generally  less  accurate 
than  the  use  of  the  limit  distributions. 

Let  0/  be  an  estimator  of  0/  so  that  0/  C  0/  wp  — +  1.  We  also  want  0/  to  be  a  sharp  estimate, 
for  instance,  we  can  set  0/  =  C„(logn),  which  under  C.l  and  C.2  is  consistent  and  converges  at  rate 
(logn/n)^/'''.  Let  also  cbe  any  consistent  estimate  of  the  a-quantile  of  C  defined  in  C.4.  Recall  that 
we  used  c  for  the  construction  of  the  region- wise  critical  value. 

The  following  two  regions  will  be  considered.  The  first  region  is  a  simple  region  defined  by  a 
single  critical  value: 

C„(2*  A  c)  -  {0  G  0  :  OnQniO)  <  c*  A  c},  where  c*  -  sup  c{9).  ,^  ^. 

The  second  regions  is  a  region  that  employs  critical  values  that  depend  on  9: 

Cn{c{-)  A  c)  =  {0  e  0  :  anQn{9)  <  c{9)  A  c}.  (7.2) 

Remark  7.1.  The  two  constructions  are  equivalent  in  many  cases,  since  the  objective  functions 
can  be  transformed  to  have  equal  quantiles.^^  The  fist  construction  is  more  parsimonious,  easier  to 
compute,  and  report.  Clearly,  either  region  is  a  subset  of,  and  hence  is  no  larger  than,  the  confidence 
region  Cn{c)  for  0/. 


^^There  is  0*  G  0  such  that  the  model  law  Pg  agrees  with  the  actual  law  of  data  P. 
This  can  be  seen  by  defining  the  new  criterion  function  Qnifi)  :=  Qn{6)/ ma,x[c{a,9),e]  for  all  0  G  0/. 
In  many  examples  this  is  unnecessary,  as  criterion  functions  have  the  equi-quantile  property  by  using  optimal 
weights. 
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Remark  7.2.  If  c{d)  is  obtained  by  subsampling,  the  truncation  of  critical  values  by  c  improves 
Bahadur  efficiency:  Indeed,  ii  9  ^  0/,  we  have  that  c{9)  —>p  +cx>,  typically  at  the  rate  a^,  but 
c{0)Ac—>p  c{a)  <  GO.  Therefore,  subsampling  implementations  that,  in  contrast  to  our  construction, 
do  not  truncate  c{6)  by  c  suffer  from  the  loss  of  power  in  finite-samples.  Chernozhulcov  and  Fernandez- 
Val  (2005)  show,  in  a  different  situation,  that  the  Bahadur  inefficiency  of  canonical  (untruncated) 
subsampUng  leads  to  a  substantial  loss  of  power  in  finite  samples. 

Remark  7.3.  The  construction  of  either  region  employs  the  pointwise  inversions  of  tests  of  point 
hypotheses  Q{9)  =  0.  This  follows  the  Anderson  and  Rubin  (1949)  construction  of  confidence  regions 
for  the  case  of  simultaneous  equations.  In  the  case  of  weakly  identified  and  unidentified  linear 
instrumental  variable  models,  the  construction  was  used  by  Dufour  (1997)  and  Staiger  and  Stock 
(1997),  among  others.  In  a  partially  identified  dynamic  censored  regression  model,  Hu  (2002)  also 
employed  region  (7.2)  for  inference.  In  partially  identified  instrumental  variable  quantile  regression 
model,  Chernozhukov  and  Hansen  (2004)  also  use  the  region  (7.2).  A  previous  version  of  the  paper, 
Chernozhukov,  Hong,  and  Tamer  (2002),  Appendix  G,  also  gave  pointwise  constructions.  Imbens 
and  Manski  (2004)  investigate  the  Wald  type  inference  about  9*  for  the  special  case  where  9*  is  a 
real  parameter  known  to  belong  to  an  interval  which  endpoints  can  be  consistently  estimated.  The 
analysis  here  apphes  to  a  considerably  more  general  setting. 

Remark  7.4.  The  more  recent  developments  in  the  literature  include  Andrews  and  Guggenberger 
(2006)  and  Sheikh  (2006)  who  show  that  the  confidence  regions  of  the  type  proposed  here,  with 
critical  values  obtained  by  subsampling,  have  important  robustness  (uniform  coverage)  properties. 
Note,  however,  that  in  moment  condition  models,  we  can  construct  the  critical  values  using  limit 
distributions,  e.g.  see  Remark  7.6,  which  should  be  preferable  to  subsampling  due  to  higher  accuracy. 

Remark  7.5.  Due  to  reasons  given  in  Remark  7.2,  our  regions  (7.2)  constructed  using  subsampling 
will  be  less  conservative  than  the  regions  studied  by  Andrews  and  Guggenberger  (2006)  and  Sheikh 
(2006).  The  latter  are  constructed  using  canonical  (untruncated)  subsampling  critical  value  c{9).  In 
contrast,  regions  (7.2)  use  the  truncated  critical  value  c{9)  A  c. 

Theorem  7.1.  Suppose  that  (a)  Conditions  C.4  and  C.6  hold,  and  (b)  for  each  9  G  Qj  we  have 
c{9)  — >p  c{a,9)  and  c  ~^p  c{a)  >  supggg^  c{a,9),  where  c{9)  >  0  and  c  >  0  with  probability  1.  Then, 
(1)  for  any  9*  G  9/,  liminf„_oo  ^'{6'*  e  C„(c*  Ac)}  >  a,  and  (2)  lim  inf  „^oo -Pi^^*  G  C„(c(-)Ac)}  > 
a. 

Proof  of  Theorem  7.1:  Part  (1):  lim inf„_^oo -P{^*  £  Cni'S'  Ac)}  =  \im'min^ooP{o-nQn{9*)  < 
2*  Ac}  >(i)  liminf„^ooP{anOn(^*)  <c(r)A?}  >(,i)  liminf„^oo  P{a„Q„(r)  <  (c(r)  +  Op(l))  V  0} 
^[iii)  P{C{^)  ^  c(0*)}  —(iv)  >  a,  where  (i)  follows  by  construction,  (n)  follows  by  the  assumptions 
on  c  and  c{9),  (iii)  follows  by  Condition  C.6,  and  (iv)  follows  by  Condition  C.6.  Part  (2):  This  part 
trivially  follows  from  inequality  (i)  in  the  proof  of  Part  (1).  D 

The  following  theorem  provides  consistency  and  rates  of  convergence  of  the  sets  constructed 
above. 

Theorem  7.2  (Consistency  and  rates  of  convergence) .  Suppose  C.l,  C.2,  and  conditions  of  Theorem 

7.1  hold.    Consider  estimators  0/  :=  {5  G  9  :  anQn{9)  <  c{6)  f\c  +  Kn]  and  9/  :=  {^  G  9  : 
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o-nQniS)  <  c*  Ac+  Kn],  where  Kn  :—  0  when  C.3  is  known  to  hold,  and  k„  :=  logn  otherwise.  Then, 
dniOhQi)  =  Op([l/a„]V7)  ^  0^(1)  if  c.3  is^known  to  hold,  and  dH{Q],Qi)  =  Op ( [log n/a„]V7)  = 
Op(l),  otherwise.   The  same  results  apply  to  Qj. 

Proof  of  Theorem  7.2:  Since  Cni^n)  C  6/ C  6/  C  Cn{c  +  «;„),  the  rate  and  consistency  results 
follows  from  the  rates  and  consistency  results  for  C„(k„)  and  C„(c  +  k„)  obtained  in  Theorem  3.1 
and  Theorem  3.2.  D 

Theorem  7.3.  (Limits  of  Cn{0)  in  Moment  Condition  Models)  (1)  Suppose  Condition  M.l  holds 
for  the  moment  equality  model.  In  particular,  the  P-Donsker  condition  on  moment  functions  implies 
n-V2  {Yl'i=i{mi{0)  -  Ep[mi{e)]))  -^^  A(0)  =  N{Q,Ep[^{e)/\{e)']).  Then,  Condition  C.6  holds  with 

c{e)  ■.=  \\i\{eyw^i\e)f,  (7.3) 

c{e)  :^  \\^{e)'w'!\e)f  -    inf    m[e')  +  G{e')x)'w'/\d')\\\        .    (7.4) 

for  the  case  when  Qn{S)  and  Qn{d)  =  Qn{^)  —  inf^'ge  Qn{9')  are  used  for  inference,  respectively. 
(2)  Suppose  Condition  M.2  holds  for  the  moment  inequality  model.   Then,  Condition  C.6  holds  with 

c{0)  ■.=  m{9)+myw'/'{e)\\i,  (7.5) 

ae)   ■.=  \\{A{e)  +  myw'/H9)\\l-     ini    \\{Aie')  +  G{9')x  +  mrw'/'{e')\\l,  (7.6) 

for  the  case  when  Qn{^)  o-nd  Qni9)  =  Qn{9)  —  mfgiQQQn{6')  are  used  for  inference,  respectively. 

Proof  of  Theorem  7.3.  Part  (1)  follows  from  the  proof  of  Theorem  4.1.  Part  (2)  follows  from 
the  proof  of  Theorem  4.2.  D 

Remark  7.6.  (Quantiles  oiC{9)  by  Simulation)  The  quantiles  of  C(6'),  specified  in  (7.3),  (7.4),  (7.5), 
and  (7.6),  can  be  obtained  by  simulating  variable  C!^{9)  specified  respectively  in  Remarks  4.1,  4.2, 
4.3,  and  4.4.  □ 
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ADDENDUM  for  "Estimation  and  Confidence  Regions  for  Parameter  Sets  in 
Econometric  Models" 

8.  Robustness  to  Contiguous  Perturbations  of  P 

In  this  paper  P,  the  true  probability  measure,  is  the  nuisance  pajameter.  The  goal  is  to  examine 
which  contiguous  perturbations  of  the  original  fixed  P  preserve  or  do  not  preserve  the  estimation 
and  coverage  properties  of  the  confidence  regions.  The  idea  of  focusing  on  the  local  perturbations 
follows  its  uses  in  the  confidence  interval  literature,  see  notably  Dufour  (1997),  Potscher  (1991),  and 
Andrews  and  Guggenberger  (2006).  Intuitively,  contiguous  perturbations  of  P  can  not  be  statis- 
tically detected  with  certainty,  and  we  therefore  want  to  make  sure  that  contiguous  changes  in  P 
do  not  affect  the  coverage  properties  of  confidence  regions.  An  alternative  motivation  is  that,  in 
the  asymptotic  context,  the  relevant  parameter  space  for  nuisance  parameters  consists  of  contiguous 
parameter  values,  which  is  a  standard  approach  in  asymptotic  efficiency  analysis,  see  van  der  Vaart 
(1998),  Chapter  8.7.  In  fact,  finding  minimal  coverage  under  contiguous  sequences  is  equivalent  to 
establishing  local  uniform  coverage,  when  the  local  nuisance  parameters  are  allowed  to  vary  over  a 
compact  set.^^ 

We  focus  on  examining  the  robustness  of  the  main  estimation  and  inferential  results,  the  ones 
stated  in  Theorem  3.1  and  Theorem  3.3. 

8.1.  Regular  Cases.  Consider  a  triangular  sequence  of  probability  measures  {Pn_.y,n  =  1,2,...}, 
where  7  is  an  index  of  a  sequence  in  P  and  {P„^.y,7  6  r,n  =  1, ...}  C  P.  Let  P^  denote  the  law 
of  data  w\,...,Wn  under  Pn^-y-  Each  7  G  P  is  such  that  P"  is  contiguous  to  P",  the  law  of  data 
'W\^...,Wn  under  P,  namely  P"(yl„)  =  oil)  implies  P".^(A„)  =  o(l)  for  any  sequence  of  measurable 
events  j4„.^^  In  what  follows,  notation  Qi{P)  is  used  to  reflect  that  identification  region  0/  depends 
on  the  law  of  the  data  P.  Similarly,  notation  c{a,  P)  is  used  to  denote  that  the  a-quantile  of  C 
depends  on  P. 

Lemma  8.1.  [Conditions  for  Maintaining  Consistency,  Rates  of  Convergence,  and  Coverage]  (1) 
Assume  that  Conditions  C.l  and  C.2  hold  with  {Pn,-y]  replacing  {P},  for  each  7  G  P.  Then  so  do 
conclusions  of  Theorem  3.1.  (2)  Assume  that  Conditions  C.l,  C.2,  and  C.4  hold  under  {Pn,-y}  in 
place  of  {P},  for  any  7  G  P,  as  well  as  hold  under  {P},  with  the  common  limit  real  random  variable 
C,  distribution  of  which  does  not  depend  on  7.  Take  any  estimate  c  — >p  c(a,  P)  under  {P},  for 
instance,  that  provided  in  Sections  3  or  4-   Then  for  each  7  £  P, 

liminf  P„-.{e/(P„^)  C  Cn(c)}  >  a  and  ^  a  if  P{C  >  0}  >  a. 

The  first  result  states  that  consistency  and  rates  of  convergence  will  be  preserved  under  sequences 
as  long  as  C.l  and  C.2  hold  under  sequences  (replacing  P  with  P„_.y  and  9/  with  Qj{Pn,-y)  should 
cause  no  ambiguity  in  the  re-statement  of  C.l  and  C.2).  The  second  result  of  the  lemma  addresses 


^^The  weak  IV  example  presented  below  clarifies  this  statement;  see,  specifically,  equations  (8.8)-  (8.9). 
^^Throughout  this  section,  measurable  events  An  are  events  that  are  measurable  with  respect  to  (fi.jT) 
completed  with  respect  to  both  P"  and  Pn,-y 
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coverage  properties  in  the  regular  case  -  when  the  hmit  of  C„  does  not  depend  on  the  local  sequence."^ 
Note  that  the  coverage  result  is  independent  of  the  way  the  critical  value  is  estimated. 

Note  that  if  C„  is  non-regular  -  that  is,  its  limit  distribution  under  {Pn,-y}  depends  on  7  - 
the  coverage  under  sequence  depends  on  whether  the  distribution  of  Cn  under  P^^^y  is  stochastically 
dominated  in  large  samples  by  the  distribution  under  fixed  sequence  {P},  as  stated  in  Lemma  8.3 
below. 

Conditions  of  Lemma  8.1  are  verified  in  our  principal  apphcations  as  follows: 

Condition  M.S.  (Moment  Equalities)  Suppose  that  M.l  holds  for  each  P  £  V  and  that  (a)  the 
partial  identification  conditions  (4-1)  holds  uniformly  in  V,  (h)  G{9)  =  limnVgE^p  „[77i,;(0)]  exists 
and  is  continuous  over  a  neighborhood  ofQ,  for  each  7  G  F,  (c)  the  Donsker  condition  (4-2)  holds 
under  {Pn,j}  in  place  of  {P}  for  each  7  G  F,  with  the  common  limit  Gaussian  process  A(^),  (d) 
^P.,-,KW]  =  Ep[mi{e)]  +  0(1)  for  each  7  G  F,  (e)  dH{<di{Pna),Ql{P))  -  o(l)  for  each  7  G  F. 

Condition  M.  4.  (Moment  Inequalities)  Suppose  that  M.l  holds  for  each  P  e  V  and  that  (a) 
the  partial  identification  conditions  (4-5)  holds  uniformly  in  V,  (h)  G{9)  =  lim^  V6i£^p.^^„[mi(0)] 
exists  and  is  continuous  over  a  neighborhood  ofQ,  for  each  7  G  F,  (c)  the  Donsker  condition  (4-2) 
holds  under  {^71,7}  w  place  of  {P}  for  each  7  G  F,  with  the  common  limit  Gaussian  process  A{d), 
(d)  Sp„,^K(0)]  =  Ep[mi{e)]  +  0(1)  for  each  7  G  F,  (e)  and  dH{edPn,-y),Qi{P))  =  o(l)  and 
dH{Qj{Pn,-y),Qj{P))  =  0(1)  for  each  J  and  each  7  G  F  . 

Condition  (a)  is  a  locally  uniform  partial  identification  condition.  Sufficient  condition  for  con- 
dition (c)  are  well  known  and  are  given  in  van  der  Vaart  and  Wellner  (1996),  p. 173,  including  a 
quadratic-mean-differentiability  condition,  p.  406.  The  principal  condition  is  Condition  (e),  which 
requires  that  the  perturbations  of  P  affect  the  identification  region  smoothly. 

Lemma  8.2.  (Coverage,  Consistency,  Rates  under  Regular  Sequences  in  Moment  Condition  Models) 
(1)  Condition  M.3  implies  conditions  of  Lemma  8.1.  (2)  Condition  M.4  implies  conditions  of  Lemma 
8.1. 

Example  1  (contd.)  It  is  helpful  to  illustrate  conditions  M.4(a)-(e)  via  a  simple  example.  Recall 
the  example  of  interval  censored  Y  without  covariates,  in  which  case  Qi{P)  —  [Ep[Yi],  Ep[Y2]]  and 
suppose  Yi  <  Y2  P-a.s.  for  all  P  G  P  and  that  {Yi,Y2)  are  uniformly  Donsker  in  P."^  Then  condition 
M.4(a)-(d)  easily  follow.  To  verify  M.4(e)  note  that  by  contiguity  and  uniform  integrabihty  implied 
by  the  uniform  in  V  Donskerness, 

iEp^jY,],Ep^JY2])  -  {Ep[YriEp[Y2]), 

including  the  case  of  [E'pfFi],  £'p[Y'2]]  being  a  singleton.  The  last  point  is  noteworthy,  since  Imbens 
and  Manski  (2004)  used  precisely  the  case  of  identification  region  shrinking  to  a  singleton  at  a  l/y/n 
rate  as  a  counterexample  to  the  coverage  of  certain  t}T)es  of  confidence  regions. 


^^The  definition  of  regularity  follows  that  given  by  van  der  Vaart  and  Wellner  (1996),  p.  413 
^^Conditions  for  the  Donskerness  uniformly  in  V  is  well  known,  see  van  der  Vaart  and  Welhier  (1996), 
p.168-170. 
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Conditions  M.3  and  M.4  are  reasonable  in  many  examples  we  have  considered,  provided  the 
boundary  of  0/  (in  M'')  is  strongly-identified.  Conditions  M.3  and  M.4  are  not  expected  to  hold 
otherwise.  Therefore,  the  models  with  weak  identification,  cf.  Dufour  (1997),  that  are  local  to  non- 
identification,  are  not  covered  by  the  framework  of  regular  sequences.  Weak  identification  is  not 
our  focus  in  any  case.  However,  Section  8.2  provides  a  general  condition  under  which  the  proposed 
inference  methods  will  work.  The  condition  is  illustrated  with  a  weak  IV  framework  of  Dufour  (1997) 
and  Staiger  and  Stock  (1997). 

8.2.  Non-regular  cases.  The  following  lemma  addresses  non-regular  cases  mentioned  earher,  and 
shows  that  coverage  results  will  be  preserved  in  much  greater  generality. 

Lemma  8.3  (Maintaining  Partial  Consistency  and  Minimal  Coverage  under Non- Regular  Sequences). 
(1)  Suppose  that  supe^^p^^j  Q„  ==  Op„  .^(l/a„)  under  {P„,^}.  Then  e/(P„,^)  C  Cn{c)  wp  -^  1,  pro- 
vided c  ^>p  DO,  under  {Pn,j}-  (2)  Let  there  be  any  estimate  c  ^>p  c{a,  P)  under  {P}-  Suppose  that 
Condition  C.4  holds  under  fixed  P  with  the  limit  real  variable  C  that  has  a-quantile  c{a,P).  Suppose 
that  for  each  7  G  F  and  any  sequence  e„  J,  0,  we  have 

liminf  F„,^,[C„  <  {cia,P)  -  e„)  V  0]  >  a.  (8.1) 

Consider  any  estimate  c-^p  c{a,P)  under  {P],  for  instance,  that  provided  in  Section  3  or  4-   Then 
for  each  7  G  F 

liminfP„^,{e/(P„^)  CC„(c)}  >a.  (8.2) 

Note  again  that  the  result  is  independent  of  the  way  the  critical  value  is  estimated. 

Example  (Weak  IV) .  The  point  of  this  lemma  can  be  illustrated  using  a  very  simple  IV  example 
with  one  regressor: 

Y^OqX  +  c,  6I0  €  G (compact)  C  IR,  X  =  Q-Z  +  v,  and  (e,i;)|Z  ~  Af(0,  fi),  Z  ~  iV(/i,o-|).    (8.3) 

The  identification  region  is  Qi{P)  =  ©,  that  is,  we  have  complete  non-identification.   Assume  i.i.d. 
samphng  and  other  conditions  as  in  Section  4  hold  under  P. 
Now  consider  a  sequence  of  models  where 

Y^eoX  +  e,  X  =  (p/^)Z  +  v,  and  (e,i;)|Z~  A^(0,n),Z~  Af(0,CT|).  (8.4) 

Let  7  index  the  parameter  sequence  {p}.    Let  P"     denote  the  law  of  vector  {Yi,Xi,Zi,i  <  n)  in 
(8.4);  it  is  contiguous  to  law  P".  Let  P„^-y  denote  the  law  of  the  infinite  sequence  {Yi,Xi,  Zi,i  <  00) 
generated  according  to  (8.4). 
Note 

©/(Pn,7)  =  ©/(P)  =  0  if  p  -  0   and   ©/(P„,-,)  =  9o  G  ©/(P)  if  P  7^  0.  (8.5) 

This  implies  that  the  weak  limit  of  C„  under  P„^^  with  p  ^  0,  is  stochastically  smaller  than  the  weak 
limit  of  Cn  under  P„_^  with  p  =  0,  since 

sup  \\A{eyw^/\e)f  <  sup \\A{eyw^/~i9)f.  (8.6) 

So  G 
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Therefore  a-quantile  of  the  right  side  is  bigger  than  Q-quantile  of  the  left  side,  and  we  have  that 
(8.1)  is  satisfied.  Note,  compactness  of  0  is  important  in  insuring  that  the  right-hand  side  is  finite 
a.s.  Therefore,  for  each  p  e  R''  and  7  =  {p] 

liminf  P„,^{07(P„,.^)  C  C„(c)}  >  Q. 

n— >oo 

Next  we  consider  more  general  local  parameter  sequences  7  =  {p„}  with  pn  G  K  for  each  n, 
where  K  is  &  compact  subset  of  ]R;  let  T  denote  the  set  of  all  these  sequences.  The  Umit  under  each 
convergent  subsequence  p„  — >  p  is  either  the  left  or  right  side  of  (8.6).  Hence,  for  each  sequence  {7„} 
in  r  and  each  sequence  Cn  \  0, 


liminfP„,.,„[     sup      \\Aieyw'/^{9)f  <  (c(a,P)  -  e„)  V  0] 
>  liminfP[sup||A(0)'wV2(^)||2  <  (c(q,P)  -  e„)  v  0]  >  a. 

n — ^00  0 


(8.7) 


This  implies  by  Lemma  8.3  that 

Uminf  P„,^„{e/(P„,.,J  C  Cnic)}  >  a.  (8.8) 

n — »oo 

Equivalently,  for  K  denoting  any  non-empty  compact  subset  of  IR 

inf  liminf  inf  P^.{e/(P„,^)  C  C„(8)}  >  a,  (8.9) 

K     n— »(X)    p^K        "^ 

where  P^p  denotes  the  law  of  vector  {Yi,Xi,  Zi,i  <  n)  in  (8.4),  and  Pn,p  denotes  the  law  of  infinite 
sequence  {Yi,Xi,Zi,i  <  00)  generated  according  to  (8.4).  This  coverage  property  is  in  the  spirit  of 
local  asymptotic  minimax  analysis  of  estimation,  see  van  der  Vaart  (1998),  Chapter  8.7. 

8.3.  Proof  of  Lemma  8.1.  Proof  of  Part  (1).  The  proof  is 'straightforward  by  substituting  {Pn.-y} 
in  place  of  the  fixed  sequence  P  in  the  proof  of  Theorems  3.1.  D 

Proof  of  Part  (2).  We  have  that  c  —>p  c{a,P)  under  {P}.  By  contiguity,  c  -+p  c{a,P)  under 
{Pn,j}.  Therefore  Pl^{GiiPn,j)  C  C„(c)}  >  Pn,^[Cn  <  c]  =  Pn,y[Cn  <  c{a,P)  +  Op„,^(l)]  =  P[C  < 
c{a,  P)]  -|- 0(1),  by  assumption  that  Pn,-y[Cn  <  c]  — >  P[C  <  c]  for  all  c  >  0,  by  c  >  0,  and  by  continuity 
of  the  distribution  function  c  1— >  P[C  <  c]  on  [0, 00).  D 

8.4.  Proof  of  Lemma  8.2.  Proof  of  Part  (1).  The  proof  is  straightforward  by  repeating  Steps  1-4 
in  the  Proof  of  Theorem  4.1,  having  replaced  P  with  Pn,-y,  0/  with  0/(P„_-y),  Op(l)  with  Op„^(l), 
etc.,  and  then  noting  that  supe,(p„,^)  \\A{eyW'/\9)f  =  supe^(p)  \\A{9yw'/^{d)\\^  +  0p„Jl),  which 
followsby  equicontinuity  of  0  i->  A{e)'W^/'^(e)  andby  d//(e7(P„,T,),e/(P))  =  o(l)  imposed  in  M.3(e). 
By  M.3(b)  A(^)  does  not  depend  on  7,  and  by  contiguity  W{d)  does  not  either.  Hence  the  limit 
variable  C  :=  sup0^(p)  \\A{9yW'^/~{9)f  does  not  depend  on  7.  D 

Proof  of  Part  (2).  The  proof  is  straightforward  by  repeating  Steps  1-4  in  the  Proof  of  Theorem 

4.1,  having  replaced  P  with  Pn,^,  0/  with  0/(P„_.y),  and  Op(l)  with  Op„^(l).  The  exception  is  that 

in  Step  2,  we  need  to  define  ^{9)  —  lim„  ^/nEp[mi{9)]  under  fixed  sequence  {P}.  Note  that  the  key 

inequality  (6.10)  in  Lemma  6.2  on  which  the  proof  is  based  will  be  preserved  under  seciuences  {P-n,-)}- 

In  the  proof  of  Lemma  6.2,  the  convergent  subsequence  {0„}  in  Qj{P)  is  replaced  by  the  convergent 

subsequence  {&„}  in  Ql{Pn,j)^  where  convergent  means  9^  ^>  9  G  Qj{P).  Since  we  care  only  about 
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C„  in  this  Lemma,  in  repeating  the  proof  of  Lemma  6.2,  we  have  a  drastic  simplification  from  setting 
A  =  0.  [That  is,  we  only  need  to  consider  the  set  V^  —  V^  —  Qj{Pn,f)  x  {0}.]  In  addition,  we  note 
that  for  every  J,  dH{Qj{Pn,j),Qj{P))  =  o(l)  by  M.4(e),  so  that  maxj7sup0^(p^^)  J2jeJ  li'^ji^)  + 
Gj{9yX)wf{9)  +  o,^Jl)\l  =  max^sup0_^(p)  E,e^  l(A,(0)  +  Gj{eyX)WJf~{9)  +  o,^Jl)\l.  The 
last  observation  utihzed  equicontinuity  of  0  i-^  A{9yW^^^{9)  and  the  fact  that  by  M.4(b)  A(^)  does 
not  depend  on  7,  and  by  contiguity  W{0)  does  not  depend  on  7  either.  The  result  of  the  modified 
Step  2  can  be  stated  then  as 

sup    4(^,0)  =d  max     sup     ^\{A,{e))WJj'ie)  +  Op^Jl)\l. 

Hence  C  =  maxj  sup5)ge^(p)  Ylj^j  l('^i(^))^'^j/  (^)l+i  which  does  not  depend  on  7.  D 

8.5.  Proof  of  Lemma  8.3.  Part  (1).  Under  {Pnn)'  wp  -^  1,  by  construction  of  c,  supg^^p^    ^  Q^  = 

Op„,.^(l/a„)  <  c/an,  which  imphes  9/(P„,-y)  C  Cnic).  U 

Part  (2).  We  have  that  c—>p  c{a,  P)  under  {P}.  By  contiguity,  c  — >p  c(a,  P)  under  {Pn,-^]-  Hence 

Pna{®l{Pnn)    C   Cn{c)]    >   Pn,^{Cn    <   c]    >   Pn^i^n   <    {c{a,P)  -  £„)  V  0}   =   Pn^iC    <   \c{a,P)  - 

Cn)  V  0},  for  some  e„  |  0.  The  conclusion  follows  from  the  assumption  that  liminf„_^oo  Pnai'^n  ^ 
(c(a,P)-e„)VO}>a.  D 
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